Add Gateway 2.0 implementation to Cosmos driver#4319
Draft
tvaron3 wants to merge 46 commits intoAzure:release/azure_data_cosmos-previewsfrom
Draft
Add Gateway 2.0 implementation to Cosmos driver#4319tvaron3 wants to merge 46 commits intoAzure:release/azure_data_cosmos-previewsfrom
tvaron3 wants to merge 46 commits intoAzure:release/azure_data_cosmos-previewsfrom
Conversation
Add design specification for Gateway 2.0 (formerly thin client) support in the Rust Cosmos driver and SDK. The spec covers motivation, current Rust state, a phased implementation plan (RNTBD protocol, request pipeline, endpoint discovery, retry/errors, SDK integration, and a dedicated live tests pipeline), and remaining open questions. Gateway 2.0 is auto-detected from account metadata and is not exposed as a customer-facing configuration option. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses all 21 findings from the PR Deep Reviewer run on Azure#4223: - Fix EPK attribution: canonical path is the driver's EffectivePartitionKey::compute/compute_range. Flag the SDK-side get_hashed_partition_key_string as known-broken for MultiHash and must not be wired to Gateway 2.0 header injection. - Resolve stored-procedure contradiction: follow Java (no SP via thin client); explicitly reject .NET's ExecuteJavaScript allowance. - Keep RNTBD field widths as uint16 LE, cite Java RntbdRequestFrame.encode's writeShortLE. Reviewer's uint32 claim was incorrect for thin client. - Add single-source-of-truth gating model (section 3.4) with invariants. - Add fallback taxonomy (eligibility vs failure), read/write region pairing, PLF precedence, env var reframed as unsupported override. - Standardize on is_operation_supported_by_gateway20(). - Add Status/Date/Authors header, numbered TOC, Q1 through Q4 open questions, Related Specs cross-links, header-name wire-to-constant table, range-header wire format notes, wire-format ambiguity resolutions (length-inclusive, UUID byte order, payload-presence). - Expand test matrix with error-case EPK, StoredProc-rejected assertion, Bulk vs Batch, failure-fallback unwind, PLF precedence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses three findings from the PR Deep Reviewer second pass: - F-A: replace the non-existent epk_length_aware_cmp citation with EffectivePartitionKey's Ord/cmp impl, cite the actual epk_cmp_* tests in container_routing_map.rs and the binary_search_by consumer site. Point PR Azure#4087 at the correct claim. - F-B: fix the numerically wrong UUID worked example. The previous example for 12345678-1234-5678-1234-567812345678 wrote MSB bytes 78 56 34 12 34 12 78 56, conflating writeLongLE with byte-reversal of the hyphen groups. Replace with 0a1b2c3d-4e5f-6789-abcd- ef0123456789 so MSB and LSB give visually distinct LE sequences. - F-C: add a "Proxy unreachable definition" subsection enumerating transport-level (TCP refuse/timeout, TLS handshake, HTTP/2 GOAWAY, reqwest::Error connect/timeout/request before any status) and HTTP-infrastructure classes (502, 504, 503-without-Cosmos- substatus). Explicitly exclude responses carrying a Cosmos sub-status. Defer to TRANSPORT_PIPELINE_SPEC for broader classification. Cross-reference from the Retry Decision Table. Also add a "Java parity" subsection to Phase 4 documenting that ThinClientStoreModel extends RxGatewayStoreModel, that none of the Java retry policies have thin-client-specific code, and that the Rust failure-fallback counter is more thin-client-aware than Java's. Flag a Java behavioral nuance worth NOT replicating: Java marks the gateway endpoint (not the thin-client endpoint) unavailable on a thin-client 503. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per follow-up review, remove the Phase 4 "Java parity" subsection. The cross-SDK observations belong in design discussion, not in the shipping design spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Q1 (HTTP/2 vs ALPN): resolved. Gateway 2.0 is HTTP/2-only with prior knowledge; the proxy does not accept HTTP/1.x. Negotiation failure feeds the failure-fallback counter rather than downgrading. - Q3 (EPK range header names): resolved. Proxy requires the Java names x-ms-thinclient-range-min / -max. Phase 2 introduces new THINCLIENT_RANGE_MIN / _MAX constants; START_EPK / END_EPK are not emitted on the Gateway 2.0 path. - Q4 (failure-fallback thresholds): clarified initial values (N=3 in 30s sliding window, 60s cooldown) and noted the live test pipeline is the tuning surface; thresholds are internal, not customer-tunable. Updates Phase 2 header naming table and §3.1 / §3.3 references accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Existing retry policies (ClientRetryPolicy and friends) already cover the rows in that table; the spec was duplicating cross-cutting behavior. Updated the surrounding "not Proxy unreachable" bullets to reference the existing retry path instead of the removed table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Java's thin client has no equivalent mechanism: ThinClientStoreModel extends RxGatewayStoreModel, model selection is per-request and stateless, and the existing ClientRetryPolicy / WebExceptionRetryPolicy chain handles transport errors, 502/503/504, and regional unavailability uniformly across both transport modes. Rust takes the same posture - no per-partition counter, no sticky standard-gateway state, no cooldown timer, no Proxy-unreachable classification, no new gateway20_retry.rs state machine. Removed: - Failure fallback row from the Phase 4 fallback taxonomy - Proxy unreachable definition subsection - Failure-fallback references in Phase 4 retry list, files-changed, and test matrix - Open Question Q4 (thresholds) - no longer applicable - Failure-fallback counter mentions in §3.1 and Q1 The single remaining fallback is the per-request Eligibility fallback (operation not supported by Gateway 2.0 -> standard gateway), which is unrelated to failure handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RNTBD is "Real Name To Be Determined" - a placeholder name that stuck, not "Reliable Network Transfer Binary Data" (a backronym that LLM analysis tends to invent). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for the PR analyze stage: - cspell: add inline `cspell:ignore` directive for spec-specific jargon (THINCLIENT, thinclient, Mgmt, cutover, directconnectivity, footgun, cooldown, ALPN). Scoped to this file rather than the global word list since these are spec-only terms. - Link verification: convert relative sibling-spec links (TRANSPORT_PIPELINE_SPEC.md, PARTITION_KEY_RANGE_CACHE_SPEC.md, PARTITION_LEVEL_FAILOVER_SPEC.md) to absolute GitHub URLs as required by the Verify-Links guideline (sdk/ paths are not in allow-relative-links.txt). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Timeout policy: note that any Gateway 2.0-specific timeout tuning is deferred to a follow-up. - PLF interaction: PLF picks the region; within that region Gateway 2.0 is preferred whenever a gateway20_url is available, otherwise the request falls back to standard gateway. Removes the previous "PLF wins" framing that implied PLF always defeats Gateway 2.0. - Drop the 408/503/410 retry-behavior bullets. The section already states retry policies are identical to standard gateway, so re-listing them risked drift from the canonical policy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply Kevin's unresolved review feedback on PR Azure#4223: - Reword §1 Overview to say "RNTBD binary serialization over the HTTP/2 protocol" (clearer separation of serialization vs. transport). - Soften the SLA latency bullet in §2 Key Benefits to "plans to provide contractual latency commitments" — we have not updated contractual terms yet, so avoid overpromising ahead of broad-usage measurement. - Add a cross-partition query aggregation bullet under §2 Design Philosophy → SDK Responsibility, noting it stays client-side under Gateway 2.0 (no server-side aggregation). - Fix the Protocol row in the Connection Mode Comparison table: HTTP/REST → REST/HTTP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iews' into tvaron3/gateway-2.0-design
Updates the Gateway 2.0 design spec in response to PR Azure#4223 review feedback from analogrelay and FabianMeiswinkel. - Routing ownership: clarify SDK keeps regional + partition-level routing; only replica-level routing within a partition moves to the proxy. - API correctness: replace the fictional 'CosmosClient::create_item(T)' walkthrough with the real 'ContainerClient::create_item(partition_key, item, options)' signature in sections 4.1 and 4.2. - SDK <-> driver boundary: mark section 3.6-10 resolved. Gateway-2.0 constants ('THINCLIENT_PROXY_*', range headers, etc.) live exclusively in 'azure_data_cosmos_driver::constants' with no SDK re-export. The SDK invokes the generic 'CosmosDriver::execute_operation' interface and the driver decides Gateway 2.0 vs standard gateway internally. - SPROC: drop the .NET-vs-Java framing. Stored-procedure execution is out of scope for Rust SDK GA; eligibility fallback routes any incoming SPROC request to the standard gateway. - New gap (3.6-11): pre-Phase-2 audit deliverable to enumerate every EPK/PartitionKeyRange-shaped struct across both crates and consolidate on a single canonical type. - Operator override: restore 'CosmosClientOptions::gateway20_disabled' (default false) as the single supported public kill-switch. No env var (intentional discouragement of casual / fleet-wide enablement). Carries an explicit warning that flipping it voids Gateway 2.0's latency SLA and impacts 24/7 Microsoft support eligibility for performance regressions. - New gap (3.6-12): retry behavior for 449 (Retry-With: same endpoint, standard backoff, no region switch) and 404 / sub-status 1002 (PARTITION_KEY_RANGE_GONE: refresh PKRange cache, prefer remote region; PLF region wins when PLF has pinned the PKRangeId). Phase 6 test matrix expanded with both rows. - HPK gating refinement (carried forward from round 4): only emit 'x-ms-documentdb-partitionkey' alongside the EPK header(s) when the request carries the FULL partition key (point ops on any container, and full-key single-logical-partition queries on HPK containers). Prefix queries on HPK containers emit the EPK-range headers only. - Prose scrub: 'thin client' / 'thin-client' rewritten to 'Gateway 2.0' in body prose. Wire-header literals ('x-ms-thinclient-*', 'thinClientReadableLocations'), Rust API symbol names ('has_thin_client_endpoints'), and .NET / Java symbol references retained verbatim. - cspell pass clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 'footgun' with 'pitfall' in two prose locations (the Java header-mutation hazard heading and the Range semantics blockquote) and drop 'footgun' from the inline cspell:ignore directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop two mentions of internal ADO PR #2031635 from the ReadConsistencyStrategy section. Java PR #48787 and .NET PR #5685 remain as the public cross-SDK references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert open-question item 11 from a deferred audit deliverable into a concrete decision. Audit results documented inline: - Driver crate is canonical: EpkRange<T>, PartitionKeyRange (typed EffectivePartitionKey bounds), and the EffectivePartitionKey newtype with compute_range(). - SDK-crate analogs (routing::range::Range, routing::partition_key_range, hash::EffectivePartitionKey) are NOT used on the Gateway 2.0 path and remain only for legacy non-Gateway-2.0 SDK callers. Phase 2 EPK header injection MUST reuse the driver-crate types and MUST NOT introduce a new EPK-range struct or depend on any SDK-crate analog. Consistent with item 10 (no SDK Gateway-2.0 surface). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop the BaseProxyClientHttpMessageHandler attribution paragraph and the HPK full-key gating bullet section (single-component vs hierarchical, full-key vs prefix-key emission rules), and remove the matching cross-references downstream: - Header table row for x-ms-documentdb-partitionkey: simplified to describe co-emission with x-ms-effective-partition-key on point / single-logical-partition ops, without the prefix-key gating clause. - Header injection flow step 4: trimmed to just emit the EPK header; PK-header gating clause removed. - Test matrix: dropped the 'HPK PK+EPK pairing (full-key gating)' row. The spec no longer prescribes any HPK-specific PK-header gating; that behavior can be re-introduced in a follow-up if needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the two Gateway-2.0-specific HTTP headers (x-ms-thinclient-account-name, x-ms-thinclient-regional-account-name) with the existing RNTBD GlobalDatabaseAccountName token (0x00CE, String, optional), carried in the RNTBD metadata stream on every Gateway 2.0 request. - Headers table: dropped both -account-name and -regional-account-name rows. - Prose: replaced the proxy-tenant-routing section with a brief Tenant-identification note pointing to the RNTBD token. - Injection flow step 3: now serializes the RNTBD token instead of setting two HTTP headers. - Test matrix: row reframed to assert presence of the RNTBD token in the request metadata stream. This drops the regional-account-name carrier entirely; the proxy uses the global account-name only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename THINCLIENT_PROXY_* Rust constants to GATEWAY20_* family (wire header strings remain x-ms-thinclient-* server-defined) - 449 retry policy now explicitly exponential backoff (separate budget from 410/Gone) - Strengthen positive-term ban: forbid is_gateway20_allowed / gateway20_allowed / enable_gateway20 anywhere in driver, SDK, perf crate, env vars, or test wiring; only negative-term names permitted - Restore (§3.5) reference in EPK-computation bullet per reviewer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop @analogrelay attribution and the Kiran context pointer; replace with neutral phrasing. Also remove the names from the cspell:ignore directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per the negative-term naming rule (default values mean Gateway 2.0 enabled), rename the per-request gating flag and invert its logic. Update §3.1 formula, the Phase 1 request-flow step, and inline references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove firewall-rules bullet and table row (thread Azure#4 — port-based firewall framing applies to all transports, not Gateway 2.0 specifically). - Remove HTTP/2 multiplexing bullet and tighten protocol cell (thread Azure#5 — multiplexing is a transport feature shared with Gateway V1, not a Gateway-2.0-specific benefit). - Reword §3 routing-decision scope: 'once per request' -> 'once per logical operation, inherited by retries and sub-requests' (thread Azure#6) — prevents mid-operation transport-mode flips that would fragment diagnostics and break session-token affinity. - §4.1 449 retry: introduce a new ThrottleAction::RetryWith { delay, new_state: RetryWithState } variant in driver/pipeline/components.rs and extend decide_throttle_action in transport_pipeline.rs (thread Azure#7), guaranteeing structurally that the 449 budget is independent of the 410/Gone and 429 budgets. - §4.2 sub-status correction (thread Azure#8): the parent status is 404, and the sub-status name is READ_SESSION_NOT_AVAILABLE (not PARTITION_KEY_RANGE_GONE — that name belongs to 410/1002). Body rewritten to reflect the session-token-stale semantics: existing 404 retry path applies; the only Gateway-2.0-specific deviation is that we do NOT refresh the PKRange cache on 404/1002. Test-matrix row updated. - Test coverage: add HPK + Gateway 2.0 row exercising full vs partial PK forms (full -> x-ms-effective-partition-key, partial -> x-ms-thinclient- range-min / -max via EffectivePartitionKey::compute_range()). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the 'follow existing 404 path' body in §4.2 with the prior 'prefer remote region + PLF precedence' wording. Update the matching test-matrix row to assert prefer-remote routing, PLF override, and the no-PKRange-refresh invariant on 404/1002. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces the rntbd module: request frame serializer, response frame
deserializer, token codecs (Byte/UShort/ULong/Long/ULongLong/LongLong/Guid/
SmallString/String/ULongString/SmallBytes/Bytes/ULongBytes/Float/Double),
and HTTP-status mapping with optional sub-status enrichment.
Activity-Id is encoded as the 16-byte [u64 LE msb][u64 LE lsb] pair on the
wire (matching Java/.NET RNTBD), distinct from the metadata Guid token
encoding (MS GUID fields LE via Uuid::as_fields).
The capabilities header advertises bitmask "9" (PartitionMerge |
IgnoreUnknownRntbdTokens). Unlike Java which advertises "11" (also
ChangeFeedWithStartTimeFromBeginning), Rust intentionally skips that bit
in Slice 1 because the underlying behavior is not yet implemented \u2014
advertising an unimplemented capability would violate the contract.
Per AGENTS.md ("Prefer Result::Err over panicking"), serialize() and the
underlying token write_to / write_len_prefixed_* helpers all return
azure_core::Result so oversized inputs (frame > u32, SmallString > 255,
String > 64 KB) surface as data-conversion errors instead of panicking.
The module is gated by #[allow(dead_code, unused_imports)] and is not yet
wired into the transport pipeline; Slice 2 will add the dispatcher and
operation-eligibility filter.
Implements R1, R2, R3 and AC1-AC4 from .coding-harness/spec.json, mapping
to GATEWAY_20_SPEC.md \u00a75 Phase 1.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Lays the foundation for Gateway 2.0 request handling without yet routing
any traffic through it. Three deliverables:
* **Eligibility helper** — pure `is_operation_supported_by_gateway20`
that returns `true` only for Document × {Create, Read, Replace, Upsert,
Delete, Query, SqlQuery, QueryPlan, ReadFeed, Batch}. Both the outer
`ResourceType` and inner `OperationType` matches are exhaustive (no
wildcard arms) so any new enum variant is a compile-time error,
forcing an explicit eligibility decision rather than a silent
fail-closed default.
* **Account-name extraction** — `AccountEndpoint::global_database_account_name`
parses the host's first label and returns it for Cosmos endpoints
(`*.documents.azure.{com,us,cn}`). Returns `None` for the emulator,
IPv4/IPv6 literals, and custom domains; Slice 3 will read this when
emitting the RNTBD `GlobalDatabaseAccountName` token.
* **Constants relocation** — new `azure_data_cosmos_driver::constants`
owns the canonical `x-ms-thinclient-*` and `x-ms-effective-partition-key`
wire strings under `GATEWAY20_*` identifiers. The SDK's
`THINCLIENT_PROXY_OPERATION_TYPE` / `_RESOURCE_TYPE` are now
`#[deprecated]` re-exports of the new driver constants, preserving
public API while migrating consumers. `COSMOS_ALLOWED_HEADERS` is
extended to keep logging behavior unchanged.
Helper and account-name accessor are intentionally `#[allow(dead_code)]`
in this slice — Slice 3 wires them into the dispatch path. No routing,
body-wrapping, header-injection, or response-unwrap changes here.
Tests: +5 unit tests (constants pinning + distinctness, eligibility
matrix exhaustiveness, stored-proc explicit ineligibility, host
extraction table). Total 721 lib tests (was 716), all passing.
Validation: cargo fmt, cargo clippy --all-features --all-targets
-- -D warnings, cargo doc --no-deps, and cargo test --all-features all
clean for both azure_data_cosmos and azure_data_cosmos_driver.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e 3a)
Wires the Slice 2 eligibility helper into operation routing and fixes a
latent connection-pool keying bug for Gateway 2.0 endpoints.
Changes:
- pipeline/components.rs: add endpoint_key field to RoutingDecision so the
pool key is captured alongside the chosen URL rather than being recomputed
from the underlying CosmosEndpoint cache.
- pipeline/operation_pipeline.rs:
- resolve_endpoint now considers Gateway 2.0 only when the endpoint
advertises it AND the account name is parseable AND the operation is
supported by Gateway 2.0 per is_operation_supported_by_gateway20.
- Selected URL's authority is used to derive endpoint_key when routing
through Gateway 2.0; otherwise the existing endpoint cache key is reused.
- account_endpoint.global_database_account_name().is_some() is inlined at
the call site; full Option<String> threading is deferred to Slice 3b/c.
- transport/mod.rs: re-export is_operation_supported_by_gateway20 for
pipeline consumers.
Tests:
- Three new resolve_endpoint tests cover (1) ineligible operation falling
back to Gateway, (2) missing account name falling back to Gateway, and
(3) Gateway 2.0 routing producing an endpoint_key derived from the
Gateway 2.0 authority (not the gateway1 cache).
- Existing tests updated for the new resolve_endpoint signature and the
new RoutingDecision field.
Validation: cargo fmt, cargo clippy --all-features --all-targets -D warnings,
cargo test -p azure_data_cosmos_driver --all-features --lib (724 passed).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final slice of Gateway 2.0 vertical-slice plan §5: wrap a signed Cosmos HTTP request as an RNTBD request frame on the way out, and decode the proxy's RNTBD response frame back into a synthetic `HttpResponse` on the way in. With Slice 3a's routing eligibility gating, point operations on fully-specified partition keys now flow end-to-end over Gateway 2.0 when the routing decision selects it. The wrap helper consumes a signed `HttpRequest`, reuses the exact `x-ms-date` and `Authorization` values written by `sign_request` (so signature verification still holds), packs the 11 RNTBD request tokens required by the proxy (Authorization, PayloadPresent, Date, ConsistencyLevel, DatabaseName, CollectionName, DocumentName, TransportRequestId, EffectivePartitionKey, SDKSupportedCapabilities, GlobalDatabaseAccountName), and returns a brand-new `HttpRequest` carrying just the outer `User-Agent` and `x-ms-activity-id` headers. The unwrap helper runs only on outer HTTP 200 — outer non-200 responses (proxy or transport errors) pass through unchanged so they surface as their actual status, not a synthesized `TRANSPORT_GENERATED_503`. Consistency resolution lives in a new `resolve_effective_consistency` helper that implements the precedence chain from spec §5.2 at operation-pipeline scope, then flows through `TransportRequest` so the wrap helper never inspects HTTP headers for consistency. Failure modes: - Wrap failure (missing signed header, malformed activity_id, bad resource link, missing account name) → `CLIENT_GENERATED_400` TransportError, `RequestSentStatus::NotSent`. Adds a new `SubStatusCode::CLIENT_GENERATED_400` (20400) mirroring the existing `CLIENT_GENERATED_401` pattern. - Unwrap failure (outer-200 with undecodable RNTBD body, or inner status outside 100..=599) → `TRANSPORT_GENERATED_503`, `RequestSentStatus::Sent`. - Outer non-200 → outer triple passes through unchanged, no unwrap. Files: - `transport/gateway20_dispatch.rs` (new): `WrapInputs`, `wrap_request_for_gateway20`, `unwrap_response_for_gateway20`, `parse_resource_names`, `effective_partition_key_bytes`, `next_transport_request_id` (process-wide AtomicU32), 13 unit tests. - `transport/rntbd/tokens.rs`: new `RntbdRequestToken` enum with 11 verified IDs and `Token::*` named constructors. IDs cross-referenced against Java's `RntbdConstants.RntbdRequestHeader`. - `transport/transport_pipeline.rs`: wrap call site between `sign_request` and `execute_http_attempt` (gated on `transport_mode == Gateway20`); unwrap call site inside `finalize_http_attempt`'s Response branch before `map_http_response_payload` (gated on `transport_mode == Gateway20` AND outer status == 200); new `gateway20_wrap_error_result`, `gateway20_unwrap_error_result`; `TransportPipelineContext` carries `account_name`; 6 new integration tests. - `pipeline/components.rs`: `TransportRequest` carries `transport_mode`, `operation_type`, `partition_key`, `partition_key_definition`, `effective_consistency`. - `pipeline/operation_pipeline.rs`: real `account_name: Option<String>` binding, `effective_consistency` resolved at op-pipeline scope, all five new fields populated in `build_transport_request`. - `options/read_consistency.rs`: new `resolve_effective_consistency(strategy, account_default) -> DefaultConsistencyLevel` per spec §5.2, with 4×5 table test. - `models/cosmos_status.rs`: `SubStatusCode::CLIENT_GENERATED_400` (20400) and `CosmosStatus::CLIENT_GENERATED_400`. - `transport/cosmos_headers.rs`, `transport/mod.rs`, `options/mod.rs`: exports for the new helpers/constants. 20 new tests; 744 passed, 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps `next_transport_request_id` from `Ordering::Relaxed` to `Ordering::AcqRel`. While `Relaxed` already guarantees uniqueness on `fetch_add` (the operation is atomic regardless of ordering), the stronger ordering ensures the increment is globally visible in diagnostic traces — preventing any apparent ID-collision confusion when two concurrent requests are inspected in logs. Found in deep-review pass over the Gateway 2.0 stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the Phase 6 deliverables called out in `docs/GATEWAY_20_SPEC.md` (lines ~500-594): **CI infrastructure (NEW)** - `sdk/cosmos/ci-gateway20.yml` — dedicated PR-and-manual-dispatch pipeline that runs Gateway 2.0 live tests against a pre-provisioned thin-client account. Reads `AZURE_COSMOS_GW20_ENDPOINT` / `AZURE_COSMOS_GW20_KEY` from pipeline secrets (per spec Q2). - `sdk/cosmos/live-gateway20-matrix.json` — matrix with `gateway20` and `gateway20_multi_region` test categories. **Driver pipeline tests (NEW)** - `azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs` — 7 integration tests gated on the `__internal_mocking` feature. Uses a capturing `HttpClientFactory` to inspect outgoing requests. Coverage: operator-override pool flag, V1 dual-consistency-header invariant, V2 dual-token contract lock, capabilities header pin (`x-ms-cosmos-sdk-supportedcapabilities = "9"`). Live-account companions for stored-proc fallback, diagnostics validation, and operator override are stubbed with TODO(Phase 6) markers — they require either a public SDK Gateway 2.0 toggle or per-request diagnostics surfaces that don't exist yet. **Driver fault injection (EDIT)** - `driver_fault_injection.rs` — adds 3 emulator-gated contract locks: 503 → regional failover, 408 → cross-region for reads, 404/1002 → remote-preferred without PKRange refresh. Today the rules fire on whichever transport is selected at dispatch (the `FaultInjectionCondition` API does not yet support a per-transport- kind filter); each test carries a TODO(Phase 6) describing the tightening once that filter lands. **SDK tests (NEW + EDIT)** - `azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs` — 6 scaffolded E2E stubs gated on `test_category = "gateway20"` plus the `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env vars. Bodies are intentionally empty until `CosmosClientOptions` exposes a public Gateway 2.0 toggle; the test names lock in the contract. - `cosmos_fault_injection.rs` — adds a Gateway 2.0 ConnectionError fallback contract lock with a TODO(Phase 6) marker. - `mod.rs` — registers the new `gateway20_e2e` module. - `azure_data_cosmos/build.rs` — adds `gateway20` to the recognized `test_category` cfg values so the new e2e file compiles without `--cfg` warnings. **Verification:** `cargo fmt`, `cargo clippy --all-features --all-targets -- -D warnings`, `cargo build --all-features --tests`, and `cargo test --all-features` all pass on both crates. Driver maintains the 744-test baseline; SDK maintains the 463-test baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a "Capability bit composition" subsection to GATEWAY_20_SPEC.md under "SDK-supported-capabilities advertisement" that explains why the Rust driver advertises bitmask `9` while Java advertises `11`. The table breaks down each bit (PartitionMerge=1, IgnoreUnknownRntbdTokens=8, plus an additional Java-only capability at bit 1=2 left unnamed pending verification against Java source) and explicitly states that the Rust driver only advertises capabilities it implements end-to-end. Adding any new bit requires implementing the behavior first, then incrementing `SUPPORTED_CAPABILITIES_BITS` in `cosmos_headers.rs` and re-pinning the Phase 6 header-value test. This addresses one of the documented follow-ups from PR Azure#4319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per Gateway 2.0 spec §3, options that gate Gateway 2.0 must use negative-term names so the boolean default of `false` corresponds to the GA target state (Gateway 2.0 enabled). `is_gateway20_allowed` predated that rule. Changes: * `ConnectionPoolOptions::is_gateway20_allowed` → `gateway20_disabled` (semantics inverted) — field, getter, builder field, builder setter. * `ConnectionPoolOptionsBuilder::with_is_gateway20_allowed` → `with_gateway20_disabled` (semantics inverted). * Removed the `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` env var lookup. Spec §3 forbids env-var toggles for Gateway 2.0; getter doc-comment now explains why. * HTTP/2-disabled-forces-disabled rule preserved (`gateway20_requires_http2` test still verifies it). * Pre-GA the default stays at `true` (disabled) with an explicit TODO to flip to `false` once Slice 3d (EPK cutover), HPK partial-PK, and continuation-token format are complete. Defaulting on while the codepath is incomplete would route customer traffic through an unfinished pipeline. * All call sites in driver, transport tests, pipeline tests, the `gateway20_e2e` doc, and the perf binary updated. Booleans inverted at every `capturing_runtime(...)` and builder call. * TRANSPORT_PIPELINE_SPEC.md prose reference renamed in lock-step. * Perf binary's `gateway20_disabled` diagnostic field is hardcoded `true` for now with TODO to wire from the SDK option once that builder method lands (Phase A item 5). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These were re-exports of the canonical driver-level `GATEWAY20_OPERATION_TYPE` / `GATEWAY20_RESOURCE_TYPE` constants, marked `#[deprecated(since = "0.33.0")]`. The Gateway 2.0 work is pre-GA and there are no released callers depending on the old identifier — back-compat aliases are not needed and only add API surface that we'd have to delete later. Removes: * `THINCLIENT_PROXY_OPERATION_TYPE` * `THINCLIENT_PROXY_RESOURCE_TYPE` The `COSMOS_ALLOWED_HEADERS` macro continues to reference the canonical driver constants directly (unchanged), so the headers stay on the logging allowlist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds an optional `TransportKind` filter to fault injection rules so that callers can scope a rule to Gateway 1.x or Gateway 2.0 traffic without affecting metadata clients or the other dataplane transport. Driver: * `FaultInjectionCondition` gains a `transport_kind: Option<TransportKind>` field plus a `with_transport_kind` builder method. * `FaultInjectionEvaluation` gains a `TransportKindMismatch` variant emitted when a rule restricts to a transport that this client does not serve. * `FaultClient` now carries the bound transport kind (set at construction by `FaultInjectingHttpClientFactory` from `HttpClientConfig`). Metadata clients have `transport_kind == None` and so never match a rule that requires a specific transport — this prevents a Gateway-2.0 rule from accidentally firing on account discovery or other metadata traffic. * `HttpClientConfig` records the transport kind for each constructor: metadata = None, dataplane gateway = Some(Gateway), dataplane Gateway 2.0 = Some(Gateway20). * Three new unit tests in `fault_injection::http_client::tests` cover the match / mismatch / metadata-client cases. SDK: * `FaultInjectionCondition` mirrors the new field and builder method. * `fault_injection::TransportKind` is re-exported from the driver so SDK consumers do not need to depend on the driver crate. * `driver_bridge` translates the SDK-side `transport_kind` through to the driver builder. Tests: * The three Gateway 2.0 fault-injection tests in `driver_fault_injection.rs` now scope their rules to `TransportKind::Gateway20` and are gated behind the `gateway20` test category — the rule semantics are now correct, and the tests no longer fire spuriously on emulator account-discovery traffic. * The matching SDK-side test in `cosmos_fault_injection.rs` is similarly scoped and gated. Once the public SDK Gateway 2.0 toggle lands, the assertion will flip from 'rule never fires' to 'read succeeds via the standard-gateway fallback'. * `azure_data_cosmos_driver/build.rs` declares the new `gateway20` test_category value (the SDK `build.rs` already declared it). Resolves the `with_transport_kind` follow-up from PR Azure#4319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a public `with_gateway20_disabled(bool)` method on `CosmosClientBuilder` that propagates the flag to the underlying driver's `ConnectionPoolOptionsBuilder`. This is the SDK-level entry point that operators use to opt in to (or out of) the Gateway 2.0 transport. Pre-GA, the toggle defaults to `true` (Gateway 2.0 suppressed) so the behavioural change must be explicitly requested. The negative-term name mirrors the driver-side flag and follows the negative-term policy from GATEWAY_20_SPEC §3. With the public toggle in place, fills the placeholder bodies in `gateway20_e2e.rs` for point CRUD, query, transactional batch, diagnostics validation, and the operator-override case. Each test provisions a fresh database+container against the live Gateway 2.0 account, drives the operation, and asserts the standard `CosmosDiagnostics` fields are populated. The change-feed test stays empty until the SDK exposes a public change-feed API; `TransportKind` assertions are documented as future work pending CosmosDiagnostics exposure of the driver transport kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wraps Gateway 2.0 dispatch with the Slice 3d cutover from a single EffectivePartitionKey RNTBD token to a payload that is *either* a point EPK token *or* an outer HTTP range header pair, never both. - Replace effective_partition_key_bytes with effective_partition_key_payload that calls EffectivePartitionKey::compute_range and branches on start == end (point) vs strict prefix (range). - Point ops (full PK or single-hash) keep emitting the EPK RNTBD token (0x005A) — current proxy contract preserved. - HPK partial-PK dispatches emit x-ms-thinclient-range-min/-max as outer HTTP headers (canonical un-padded hex), matching .NET's ProxyStartEpk/ProxyEndEpk and the spec's range header wire format. - compute_range errors propagate as DataConversion (mapped to BadRequest upstream) rather than emit broken EPK metadata. Adds three regression tests: - wrap_emits_range_headers_for_hpk_prefix_partition_key - wrap_emits_token_only_for_full_hpk_partition_key - wrap_rejects_partition_key_with_too_many_components Refs PR Azure#4319 follow-up: Slice 3d EPK cutover for queries/read-feeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the spec test-coverage row 'HPK + Gateway 2.0: full vs partial PK'
on the SDK side: the unit-level header-emission proof was added with the
Slice 3d cutover; this commit guards the public SDK surface against
regressions where partial-PK queries would silently degrade.
The new test:
- Provisions a 3-component HPK container
(/tenantId, /userId, /sessionId).
- Inserts items spread across two tenants × two users × two sessions.
- Reads one item back via its full 3-component PK, exercising the
EPK-token point-op path.
- Queries with a 1-component prefix (tenantId only) — the dispatcher
emits x-ms-thinclient-range-min/-max — and asserts:
* at least one page is returned;
* every returned item belongs to the targeted tenant (no cross-tenant
bleed);
* the set of returned IDs matches the expected per-tenant set.
PartitionKey only has tuple From-impls for 2 and 3 components; the
1-component prefix is constructed from a Vec<PartitionKeyValue> so the
dispatcher sees an HPK partial PK rather than a single-hash key.
Refs PR Azure#4319 follow-up: HPK partial-PK paths.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The wrap path was dropping the inbound x-ms-continuation header before serializing the RNTBD frame, so paginated queries on Gateway 2.0 always restarted from page one and never advanced. - Add RntbdRequestToken::ContinuationToken (0x0006, String) to mirror Java's RntbdRequestHeader.ContinuationToken and the same not-in-thinClientProxyExcludedSet behavior in .NET. - Plumb the inbound x-ms-continuation header into the RNTBD metadata stream as a string token; values are passed through verbatim (including empty strings) for symmetry with the unwrap side and the .NET/Java implementations. - Document the request/response continuation-token format in GATEWAY_20_SPEC.md. - Add 3 driver-layer unit tests covering present/absent/empty header scenarios, plus an emulator-only E2E test (gateway20_query_paginates_via_continuation_tokens) that forces multi-page pagination and asserts no row is returned twice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pulls in PR Azure#4223 (Gateway 2.0 design spec) plus other release-branch work so the PR diff no longer shows GATEWAY_20_SPEC.md as a new file. Conflict resolution: - GATEWAY_20_SPEC.md: keep both new HEAD subsections (continuation-token format + capability-bit composition table); upstream side was empty. Post-merge integration fixes (call sites broken by upstream signature changes): - operation_pipeline.rs: add 'effective_consistency: Session' field to two TransportRequestContext literals in the batch-headers tests. - driver_fault_injection.rs: add 'item1' item-id arg to three context.create_item(...) calls (matches new test_client.rs API). - gateway20_e2e.rs / cosmos_fault_injection.rs: add item-id arg to seven container.create_item(...) calls (matches new SDK API). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Gateway 2.0 (the next-generation Cosmos DB dataplane transport) is now on by default whenever the account advertises a thin-client endpoint and HTTP/2 is allowed on the connection-pool options. Operators can still opt out per-client via `ConnectionPoolOptionsBuilder::with_gateway20_disabled(true)` (driver) or `CosmosClientBuilder::with_gateway20_disabled(true)` (SDK). The doc comments on both opt-out methods (and on the underlying field + accessor) are rewritten to: * Drop the negative-term explanation that framed Gateway 2.0 as a pre-GA opt-in. Gateway 2.0 is on by default and the docs now say so directly. * Add a 'Latency caveat' section noting that Gateway 2.0 is not covered by the Cosmos DB regional latency SLA. Workloads with strict latency requirements should evaluate before relying on it. Test impact: the only test asserting on the default is `connection_pool::tests::connection_pool_options_builder_defaults`, either set the flag explicitly or force HTTP/2 off (which short-circuits to disabled regardless of the default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The standalone `sdk/cosmos/ci-gateway20.yml` pipeline is removed. The
Gateway 2.0 ("thin client") live tests now run as a second
`LiveTestMatrixConfigs` entry on the main `sdk/cosmos/ci.yml`
pipeline. This mirrors how the Java Cosmos SDK plumbs its thin-client
live tests in `sdk/cosmos/tests.yml` (Azure/azure-sdk-for-java).
The new entry (`Cosmos_gateway20_live_test`) points at
`live-gateway20-matrix.json` and reuses the same `azure-sdk-tests-cosmos`
service connection. Two pipeline-level `EnvVars` are wired in so the
tests can connect to a pre-provisioned thin-client account that is not
created per-pipeline-run:
* `AZURE_COSMOS_GW20_ENDPOINT` ← `$(thinclient-test-endpoint)`
* `AZURE_COSMOS_GW20_KEY` ← `$(thinclient-test-key)`
(NOTE: those secret variable names follow Java's convention. They may
need to be renamed to whatever the Cosmos service connection actually
exposes; this can be verified on the next live-test run.)
The matrix machinery still requires `ArmTemplateParameters`, so the
deploy step continues to create a throwaway account; the Gateway 2.0
tests just ignore it and connect via the env vars instead. The
existing test-category gating (`gateway20` /
`gateway20_multi_region`) flows through the bicep template into
`COSMOS_RUSTFLAGS`, which gates the
`#[cfg(test_category="gateway20")]` test functions — independent of
the account the tests connect to.
Doc comments in `gateway20_e2e.rs` and the
`GATEWAY_20_SPEC.md` file-changes table are updated to reference the
consolidated pipeline.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the SDK's parallel fault-injection type system with re-exports
from azure_data_cosmos_driver::fault_injection. The duplicate types
existed only because the SDK and driver each had their own injection
module; with the driver fault-injection now wired into the gateway
transport via a shared rule slice, there is no reason to keep two
copies.
Removed:
- sdk/.../fault_injection/{condition,result,rule}.rs (~590 lines)
- driver_bridge::sdk_fi_rules_to_driver_fi_rules() and the entire
feature-gated translation block (~127 lines)
- The dual-state Arc<AtomicBool>/Arc<AtomicU32> wiring on the SDK rule —
no longer needed because both transports now share the same
Arc<FaultInjectionRule>.
- Dead 'passthrough_statuses' tracking on the SDK FaultClient.
Kept SDK-owned:
- FaultInjectionClientBuilder — produces the gateway-side FaultClient
HTTP transport.
- A small fault_operation_for_sdk(SdkOpType, SdkResType) adapter so
CosmosRequest::add_fault_injection_headers can stamp the right
operation tag using the SDK enum (the driver's
FaultOperationType::from_operation_and_resource takes driver enums).
Other touches:
- Promoted driver FaultInjectionRule::increment_hit_count to pub so the
SDK FaultClient (separate crate) can call it.
- CustomResponse construction in tests now uses CustomResponseBuilder
because driver fields are private; accessor calls (status_code(),
body(), region(), etc.) replace direct field reads.
- Forward-compat: the FaultInjectionErrorType match in apply_fault gets
a wildcard arm because the driver enum is #[non_exhaustive].
- Updated sdk-to-driver-cutover.md to describe the post-cutover
architecture and drop references to the deleted bridge function.
Net diff: -1005 / +144 lines.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds entries to both crate CHANGELOGs covering the changes from the prior three commits in this PR: - Driver 0.3.0 (Features Added): Gateway 2.0 transport is now the default; the new `with_gateway20_disabled()` opt-out toggle, with the latency / SLA caveat surfaced explicitly so operators know what they're getting into. - SDK 0.34.0 (Features Added): the matching public `CosmosClientBuilder::with_gateway20_disabled()` builder. - SDK 0.34.0 (Breaking Changes): the fault-injection re-export consolidation. The SDK previously had its own `FaultInjectionRule`/`FaultInjectionCondition`/etc with public fields; field access is now via accessor methods on the driver re-exports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Build Analyze CI stage runs `cargo doc --no-deps --all-features` and was failing with two issues introduced by the fault-injection re-export refactor (commit d010978): 1. `error: redundant explicit link target` on the `TransportKind` re-export — the explicit `(azure_data_cosmos_driver::diagnostics::TransportKind)` target is redundant because rustdoc resolves `[`TransportKind`]` to the same destination via the in-scope `pub use`. 2. `warning: public documentation for `fault_injection` links to private item `crate::operation_context::OperationType`` (and the matching warning for `ResourceType`). Rustdoc's `private_intra_doc_links` lint flags these because the SDK's internal `OperationType` / `ResourceType` are crate-private; the module-level prose can refer to them by bare name without any intra-doc link target. Both fixes are docstring-only — no behavioral change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per GATEWAY_20_SPEC.md §35, 'Gateway 2.0' is the canonical name in
all Rust code, docs, and comments. 'Thin client' is reserved only for
(a) Java/.NET source symbol references and (b) literal wire-header
strings such as x-ms-thinclient-proxy-*.
Renamed (Rust):
- AccountProperties accessor methods on the driver-internal cache:
has_thin_client_endpoints → has_gateway20_endpoints
thin_client_writable_regions → gateway20_writable_regions
thin_client_readable_regions → gateway20_readable_regions
- routing_systems::parse_thin_client_locations and its parameter /
local-variable names → parse_gateway20_locations.
- E2E test fn gateway20_query_streams_through_thin_client →
gateway20_query_streams.
- Test fixture URL host central.thinclient.azure.com →
central.gateway20.azure.com in operation_pipeline.rs.
Renamed (docs/comments/CI):
- Doc comments and inline comments referencing 'thin client' /
'thin-client' across CHANGELOGs, connection_pool, constants,
adaptive_transport, account_metadata_cache, diagnostics_context,
operation_pipeline, routing_systems, fault-injection tests,
gateway20 pipeline tests, GATEWAY_20_SPEC, TRANSPORT_PIPELINE_SPEC,
cosmos_client_builder (public-facing API doc), gateway20_e2e
module doc, and ci.yml comments.
- ci.yml secret variable names $(thinclient-test-endpoint) /
$(thinclient-test-key) → $(gateway20-test-endpoint) /
$(gateway20-test-key), mirroring the AZURE_COSMOS_GW20_*
env-var naming convention. NOTE: the corresponding KeyVault-backed
entries in the azure-sdk-tests-cosmos service connection's
variable group may need a matching rename by engsys.
- Log messages 'non-HTTPS thin-client endpoint URL' / 'Duplicate
thin-client region with conflicting URL' rewritten to use
'Gateway 2.0'.
- Dropped public-facing parentheticals like 'Gateway 2.0 ("thin
client")' from CHANGELOG and CosmosClientBuilder docs per the
user request to stop explaining the legacy term.
Kept (per policy exceptions):
- All x-ms-thinclient-* / x-ms-cosmos-use-thinclient wire header
strings and the inline-doc comment block in constants.rs that
explains why those wire names are unchanged (server-defined).
- All Java/.NET symbol references (ThinClientStoreModel,
ThinClientStoreClient, ThinClientHttpMessageHandler,
thinClientProxyExcludedSet, useThinClientStoreModel, …) in spec
text.
- AccountProperties::thin_client_*_locations field names — they
mirror the wire JSON properties via #[serde(rename_all =
"camelCase")] and renaming would require explicit
#[serde(rename = "…")] decorations. Added explanatory note to
the field doc comments.
- GATEWAY20_USE_THINCLIENT constant identifier — wire-string-mirroring
suffix.
- GATEWAY_20_SPEC.md historical 'formerly "thin client"' note,
cspell directive, and the naming policy itself.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements Cosmos DB Gateway 2.0 (a.k.a. "thin client") in the Rust driver and SDK, per the design spec merged in #4223.
This is a multi-slice implementation that adds:
is_gateway20_allowedpool flag, fallback to standard gateway, and shared dispatch state (Slice 3b/c)Commits
83632099802e479TransportKind::Gateway20, account-properties probe wiring, fallibility plumbingc475d876963626is_gateway20_allowedflag; standard-gateway fallback56890a7AcqRelordering for Gateway 2.0 transport request idRelaxedbutAcqRelimproves diagnostic-trace clarity27218fbPhase 6: testing & infrastructure
CI
sdk/cosmos/ci-gateway20.yml— dedicated PR-and-manual live-test pipeline. ReadsAZURE_COSMOS_GW20_ENDPOINT/AZURE_COSMOS_GW20_KEYfrom pipeline secrets (the spec's Q2 — the GW2.0 account is pre-provisioned out-of-band; standard ARM/Bicep cannot create one).sdk/cosmos/live-gateway20-matrix.json—gateway20(single region) +gateway20_multi_regiontest categories.Driver pipeline tests (
gateway20_pipeline_tests.rs, gated#[cfg(feature = "__internal_mocking")])A
CapturingTransport+CapturingFactorypair that refuses every send and records the outgoing request. Active tests:"9"Three live-account stubs (
#[ignore]) lock in test names for stored-proc fallback, diagnostics validation, and SDK-boundary operator override.Fault injection (additive)
driver_fault_injection.rs— three new emulator-gated tests: 503 → regional failover, 408 → cross-region for reads, 404/1002 → remote-preferred without PKRange refresh. Each carries aTODO(Phase 6)describing how to scope to the GW2.0 transport onceFaultInjectionCondition::with_transport_kindlands.cosmos_fault_injection.rs— one new GW2.0 ConnectionError → standard-gateway fallback contract lock.SDK E2E (
gateway20_e2e.rs)Six placeholder tests gated on
test_category = "gateway20"plus the two env vars. Bodies are intentionallyTODO(Phase 6)untilCosmosClientOptionsexposes a public Gateway 2.0 toggle. The names lock in the contract: point CRUD, query streaming, transactional batch, change-feed (latest version), diagnostics validation, SDK-boundary operator override.Verification
Two deep-review cycles were run against the stack — one on the impl Slices 1–3b/c (one Tier 2 finding, addressed in
56890a7) and one on the Phase 6 commit (one Tier 2 consistency suggestion that was not actionable under repo conventions).Documented follow-ups (not blocking design review of this stack, but BLOCK merge)
CosmosClientOptions::with_gateway20_disableddoes not exist, which is why the six E2E tests are scaffolds.FaultInjectionCondition::with_transport_kind: missing today; the four new fault tests cannot scope themselves to the Gateway 2.0 transport and run on whichever transport is selected at dispatch.is_gateway20_allowed→gateway20_disabledrename (R15): pool option pre-dates the spec's "negative-term flags only" rule.ConsistencyLeveltoken id is0x0010, not0x00F0.9while Java uses11.Implementation reference: #4223 (spec). See
sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.mdfor the full design.