From a25ae1f67dc79d213eaf88a1de80d697ef649f5a Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 11:29:20 -0700 Subject: [PATCH 01/48] Cosmos: add Gateway 2.0 design spec Add design specification for Gateway 2.0 (formerly thin client) support in the Rust Cosmos driver and SDK. The spec covers motivation, current Rust state, a phased implementation plan (RNTBD protocol, request pipeline, endpoint discovery, retry/errors, SDK integration, and a dedicated live tests pipeline), and remaining open questions. Gateway 2.0 is auto-detected from account metadata and is not exposed as a customer-facing configuration option. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 429 ++++++++++++++++++ 1 file changed, 429 insertions(+) create mode 100644 sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md new file mode 100644 index 00000000000..982c92f2031 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -0,0 +1,429 @@ +# Gateway 2.0 Design Spec for Rust Driver & SDK + +## 1. Overview + +Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clients to route data-plane operations through a lightweight proxy endpoint instead of directly to backend replicas. It uses RNTBD binary protocol over HTTP/2, with the proxy handling partition routing, replica selection, and load balancing. + +**Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Avoid "thin client" except when referencing Java/.NET code. + +--- + +## 2. Motivation + +### Why Gateway 2.0? + +Traditional Cosmos DB offers two connection modes: + +- **Gateway mode**: Simple HTTP/REST proxy — easy to use, but adds an extra network hop through a shared stateless gateway. No latency SLA guarantees because the gateway is a shared, best-effort proxy. +- **Direct mode**: SDK connects directly to backend replicas via TCP/RNTBD — provides latency SLA guarantees but requires the SDK to manage replica discovery, connection pooling, and partition routing itself. This adds significant complexity to the SDK and requires direct network access to backend nodes. + +**Gateway 2.0 bridges this gap.** It is a gateway mode with SLA latency guarantees — combining the operational simplicity of gateway mode (single endpoint, no direct backend connectivity required, firewall-friendly) with the performance characteristics of direct mode (RNTBD binary protocol, server-side partition routing, replica-aware load balancing). + +### Key Benefits + +- **SLA latency guarantees** — Unlike traditional gateway, Gateway 2.0 provides contractual latency commitments comparable to direct mode +- **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTPS; no need to open firewall rules to individual backend replicas +- **Reduced SDK complexity** — The proxy handles replica discovery, connection management, and partition-level routing; the SDK only needs RNTBD serialization and endpoint selection +- **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection, reducing connection overhead vs. direct mode's per-replica TCP connections +- **Transparent failover** — The proxy handles replica failover within a partition; the SDK handles regional failover across proxy endpoints + +### Design Philosophy + +Gateway 2.0 moves partition-level routing intelligence from the SDK into the server-side proxy while keeping regional routing in the SDK. This gives the best of both worlds: + +**SDK Responsibility:** + +- Regional endpoint selection +- RNTBD serialization +- EPK header injection + +**Gateway 2.0 Proxy (Server-Side):** + +- Partition routing +- Replica selection +- Load balancing + +### Connection Mode Comparison + +| Aspect | Gateway | Gateway 2.0 | Direct | +| --- | --- | --- | --- | +| Latency SLA | No | **Yes** | Yes | +| Simple Network | Yes | Yes | No | +| Protocol | HTTP/REST | RNTBD/HTTP2 | RNTBD/TCP | +| Replica Mgmt | Proxy | Proxy | SDK | +| Partition Route | Proxy | Proxy | SDK | +| Regional Route | SDK | SDK | SDK | +| SDK Complexity | Low | Medium | High | +| Firewall Rules | 1 endpoint | 1 endpoint | N replicas | + +--- + +## 3. Current Rust State + +The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 scaffolding: + +### Already Implemented +- **`CosmosEndpoint`** — `gateway20_url: Option` field, `regional_with_gateway20()`, `uses_gateway20()`, `selected_url()` methods +- **`TransportMode::Gateway20`** enum variant in pipeline components +- **`RoutingDecision`** — carries `transport_mode` that distinguishes gateway vs gateway 2.0 +- **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config, env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` +- **`CosmosTransport`** — `dataplane_gateway20_transport: OnceLock`, lazy init with `AdaptiveTransport::gateway20()` +- **`AdaptiveTransport::ShardedGateway20`** variant — always HTTP/2 with prior knowledge +- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config +- **`TransportKind::Gateway20`** in diagnostics +- **`LocationStateStore`** — `gateway20_enabled` flag, passes through to endpoint construction +- **Routing systems** — `build_account_endpoint_state()` resolves gateway 2.0 URLs from account properties +- **`resolve_endpoint()`** in operation pipeline — selects gateway 2.0 URL when `prefer_gateway20` is true +- **Constants** — `THINCLIENT_PROXY_OPERATION_TYPE`, `THINCLIENT_PROXY_RESOURCE_TYPE`, `START_EPK`, `END_EPK` headers defined +- **EPK computation** — `get_hashed_partition_key_string()` already computes EPK in `container_connection.rs` +- **Perf crate** — `gateway20_allowed` config wiring + +### Not Yet Implemented (Gaps) +1. **RNTBD serialization/deserialization** — No binary protocol encoding/decoding exists +2. **Gateway 2.0 header injection** — Thin client proxy headers (`x-ms-thinclient-proxy-operation-type`, `x-ms-thinclient-proxy-resource-type`, EPK range headers) are not applied to requests +3. **Supported operation filtering** — No `IsOperationSupportedByThinClient()` equivalent +4. **Gateway 2.0 endpoint discovery** — Verify account metadata parsing of `thinClientReadableLocations`/`thinClientWritableLocations` +5. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) +6. **Gateway 2.0 specific retry logic** — Fallback from gateway 2.0 to standard gateway on specific errors +7. **Integration/E2E tests** — No gateway 2.0 test coverage +8. **Fault injection** — No gateway 2.0 fault injection scenarios + +--- + +## 4. Rust Implementation Plan + +### Current Request Flow (Gateway 1.0) + +1. `CosmosClient::create_item(T)` calls `ContainerClient` +2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK, resolves PKRange +3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) +4. `resolve_endpoint()` selects a gateway endpoint +5. Transport Pipeline applies cosmos headers, signs request +6. HTTP/REST request sent to Cosmos Gateway (shared proxy, no SLA) + +### Target Request Flow (Gateway 2.0) + +1. `CosmosClient::create_item(T)` calls `ContainerClient` +2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK, resolves PKRange +3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) +4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if available from account metadata) +5. Transport Pipeline checks `is_supported_by_gw20()?`: + - **YES**: Inject gateway 2.0 headers + RNTBD serialize -> HTTP/2 POST to Gateway 2.0 Proxy (SLA) + - **NO**: Standard HTTP/REST request to Cosmos Gateway (fallback) + +--- + +### Phase 1: RNTBD Protocol (Driver Layer) + +**Crate**: `azure_data_cosmos_driver` +**New module**: `src/driver/transport/rntbd/` + +The RNTBD (Reliable Network Transfer Binary Data) protocol is the wire format used by Cosmos DB for efficient binary communication. Gateway 2.0 wraps RNTBD-encoded payloads inside HTTP/2 POST requests to the proxy. + +#### What Will Be Done + +- **`rntbd/mod.rs`** — Module root, public types +- **`rntbd/request.rs`** — Request serialization: operation headers, resource metadata, partition key info → binary payload +- **`rntbd/response.rs`** — Response deserialization: 24-byte frame header → metadata section → optional body payload +- **`rntbd/tokens.rs`** — RNTBD token types (type IDs, lengths, value encodings) used in metadata sections +- **`rntbd/status.rs`** — RNTBD status code mapping to `CosmosStatus` + +#### RNTBD Request Wire Format + +There is no version negotiation for thin client RNTBD. The frame format is fixed (derived from Java `RntbdRequestFrame`): + +| Offset | Size | Field | Encoding | +| --- | --- | --- | --- | +| 0 | 4 | Total message length | uint32 LE (frame + metadata + payload) | +| 4 | 2 | Resource type | uint16 LE | +| 6 | 2 | Operation type | uint16 LE | +| 8 | 16 | Activity ID | UUID (two uint64 LE) | +| 24 | var | Metadata tokens | Token stream (filtered by `thinClientProxyExcludedSet`) | +| 24+N | 4 | Payload length | uint32 LE (only if payload present) | +| 28+N | var | Payload body | Raw bytes (JSON or Cosmos binary) | + +Frame header is 24 bytes. The `encode(byteBuf, forThinClient=true)` flag in Java excludes certain headers that the proxy does not need. + +#### RNTBD Response Wire Format + +| Offset | Size | Field | Encoding | +| --- | --- | --- | --- | +| 0 | 4 | Total message length | uint32 LE | +| 4 | 4 | Status code | uint32 LE | +| 8 | 16 | Activity ID | UUID (two uint64 LE) | +| 24 | var | Metadata tokens | Token stream (request charge, session token, continuation, etc.) | +| 24+N | var | Body payload | Raw bytes (optional) | + +#### Files Changed + +``` +NEW src/driver/transport/rntbd/mod.rs — Module root + public types +NEW src/driver/transport/rntbd/request.rs — serialize_request() → Vec +NEW src/driver/transport/rntbd/response.rs — deserialize_response(&[u8]) → RntbdResponse +NEW src/driver/transport/rntbd/tokens.rs — Token type definitions + encoding +NEW src/driver/transport/rntbd/status.rs — RNTBD ↔ CosmosStatus mapping +EDIT src/driver/transport/mod.rs — Add `pub(crate) mod rntbd;` +``` + +--- + +### Phase 2: Gateway 2.0 Request Pipeline (Driver Layer) + +**Crate**: `azure_data_cosmos_driver` + +This phase wires RNTBD serialization into the existing transport pipeline and adds gateway 2.0-specific header injection. + +#### What Will Be Done + +- **Operation filtering** — New function `is_operation_supported_by_gateway20(resource_type, operation_type) → bool` to match .NET's `IsOperationSupportedByThinClient()`. Only data-plane document operations + stored procedure Execute are eligible. +- **Header injection** — When `TransportMode::Gateway20`, inject thin client headers before sending: + - `x-ms-thinclient-proxy-operation-type` — numeric operation type + - `x-ms-thinclient-proxy-resource-type` — numeric resource type + - `x-ms-effective-partition-key` — EPK hash for point operations + - `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` — EPK range for feed operations +- **Request body wrapping** — Serialize the entire request (headers + body) into RNTBD binary format and POST as the HTTP/2 body +- **Response unwrapping** — Deserialize the RNTBD response body back into `CosmosResponseHeaders` + raw document bytes +- **Fallback** — If operation is not supported by gateway 2.0, transparently route through standard gateway + +#### Supported Operations + +Only `ResourceType::Document` is eligible for gateway 2.0 (following Java's approach): + +| Operation | Supported | Notes | +| --- | --- | --- | +| Create | Yes | | +| Read | Yes | | +| Replace | Yes | | +| Upsert | Yes | | +| Delete | Yes | | +| Patch | Yes | | +| Query | Yes | | +| QueryPlan | Yes | | +| ReadFeed | Yes | LatestVersion change feed only; excludes AllVersionsAndDeletes | +| Batch | Yes | Includes bulk operations | +| StoredProcedure Execute | **No** | Following Java; only `ResourceType::Document` is eligible | +| All other resource types | **No** | Metadata operations use standard gateway | + +#### Gateway 2.0 Header Injection Flow + +When `transport_mode == Gateway20`: + +1. Set `x-ms-thinclient-proxy-operation-type` (numeric operation type) +2. Set `x-ms-thinclient-proxy-resource-type` (numeric resource type) +3. Point operation? Set `x-ms-effective-partition-key` (EPK hash) + Feed operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` +4. Serialize headers + body into RNTBD binary format +5. POST RNTBD body to gateway 2.0 endpoint via HTTP/2 + +When `transport_mode != Gateway20`: Standard HTTP/REST request (existing flow, unchanged) + +#### Files Changed + +``` +NEW src/driver/transport/gateway20.rs — inject_gateway20_headers(), RNTBD wrap/unwrap +EDIT src/driver/transport/transport_pipeline.rs — Branch on TransportMode in execute_transport_pipeline() +EDIT src/driver/transport/cosmos_headers.rs — Add gateway 2.0 header application +EDIT src/driver/transport/mod.rs — Add is_operation_supported_by_gateway20() +EDIT src/driver/pipeline/components.rs — Add EPK fields to TransportRequest if needed +``` + +--- + +### Phase 3: Endpoint Discovery (Driver Layer) + +**Crate**: `azure_data_cosmos_driver` + +This phase is largely already implemented. The remaining work is confirming the endpoint URL pattern and ensuring the account metadata cache properly resolves gateway 2.0 URLs. + +#### What Will Be Done + +- **Verify** account metadata cache already parses `thin_client_readable_locations` / `thin_client_writable_locations` into gateway 2.0 URLs +- **Confirm** `build_account_endpoint_state()` in routing_systems.rs correctly constructs `CosmosEndpoint::regional_with_gateway20()` +- **Test** endpoint discovery with live account that has gateway 2.0 enabled +- **Add** `x-ms-cosmos-use-thinclient` header to account metadata requests to trigger gateway 2.0 endpoint advertisement + +#### Endpoint Discovery Flow (Existing) + +Account metadata response includes: + +- `writableLocations` — standard gateway URLs +- `readableLocations` — standard gateway URLs +- `thinClientWritableLocations` — gateway 2.0 URLs (when available) +- `thinClientReadableLocations` — gateway 2.0 URLs (when available) + +`build_account_endpoint_state()` matches regions across these lists and constructs `CosmosEndpoint::regional_with_gateway20(region, gw_url, gw20_url)`. The resulting `AccountEndpointState` contains endpoints with `gateway20_url: Some(...)` when gateway 2.0 is available for that region. + +#### Files Changed + +``` +EDIT src/driver/cache/account_metadata_cache.rs — Verify/fix thin client endpoint parsing +EDIT src/driver/transport/cosmos_headers.rs — Add x-ms-cosmos-use-thinclient header +TEST src/driver/routing/routing_systems.rs — Add tests for gateway 2.0 endpoint construction +``` + +--- + +### Phase 4: Retry & Error Handling (Driver Layer) + +**Crate**: `azure_data_cosmos_driver` + +Retry policies are identical between thin client and standard gateway modes in both Java and .NET — only endpoint selection and request encoding differ. The existing retry pipeline should work as-is for most cases. + +#### What Will Be Done + +- **Timeout policy** — Gateway 2.0 requests may use different timeout values than standard gateway +- **Read timeout cross-region retry** — On HTTP 408 with `GATEWAY_ENDPOINT_READ_TIMEOUT` sub-status, retry read operations in the next preferred region +- **Service unavailable (503)** — Mark endpoint unavailable for partition key range, then retry. Follow Java's conservative approach: only retry server-returned 503 or SDK-generated 503 with `SERVER_GENERATED_410` sub-status +- **Gone (410)** — Action depends on sub-status code: + - `PARTITION_KEY_RANGE_GONE` (1002): Refresh PKRange cache, retry + - `COMPLETING_SPLIT_OR_MERGE` (1007): Refresh PKRange cache, retry + - `COMPLETING_PARTITION_MIGRATION` (1008): Refresh PKRange cache, retry + - `NAME_CACHE_IS_STALE` (1000): Refresh **collection** cache (NOT PKRange), retry + - Other sub-statuses: Retry with backoff, no cache refresh +- **Gateway 2.0 fallback** — On persistent gateway 2.0 failures (e.g., proxy down), fall back to standard gateway for the remainder of the operation + +#### Retry Decision Table + +| Response | Sub-Status | Action | +| --- | --- | --- | +| 200-299 | — | Success | +| 404 | — | Not Found (propagate to caller) | +| 408 Timeout | — | Read: retry cross-region; Write: retry local only | +| 410 Gone | 1002 (PKRangeGone) | Refresh PKRange cache, retry | +| 410 Gone | 1007 (SplitMerge) | Refresh PKRange cache, retry | +| 410 Gone | 1008 (PartitionMigration) | Refresh PKRange cache, retry | +| 410 Gone | 1000 (NameCacheStale) | Refresh **collection** cache, retry | +| 410 Gone | other | Retry with backoff | +| 429 Throttled | — | Existing throttle retry loop (unchanged) | +| 449 Retry With | — | Retry same region (transient conflict) | +| 503 Unavailable | server-returned | Mark endpoint unavailable, failover | +| 503 Unavailable | SDK-generated | Only retry if `SERVER_GENERATED_410` sub-status | +| Proxy unreachable | — | Fallback to `TransportMode::Gateway` (standard HTTP/REST) | + +#### Files Changed + +``` +EDIT src/driver/pipeline/operation_pipeline.rs — Gateway 2.0 retry classification +EDIT src/driver/pipeline/components.rs — Add Gateway20Fallback variant if needed +EDIT src/driver/transport/transport_pipeline.rs — Timeout policy for gateway 2.0 +NEW src/driver/pipeline/gateway20_retry.rs — Gateway 2.0 specific retry logic (if needed) +``` + +--- + +### Phase 5: SDK Integration + +**Crate**: `azure_data_cosmos` + +Gateway 2.0 is **not exposed as a customer-facing configuration**. The SDK automatically uses gateway 2.0 when the account metadata advertises thin client endpoints. This matches the design philosophy of both Java and .NET SDKs. + +#### What Will Be Done + +- **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations. No user opt-in required. +- **Internal config** — The existing `ConnectionPoolOptions.is_gateway20_allowed` remains internal-only (not exposed in `CosmosClientOptions`). It serves as a kill switch for testing or emergency fallback, not a user-facing setting. +- **Diagnostics** — `CosmosDiagnostics` should report when a request used gateway 2.0 vs standard gateway (already partially done via `TransportKind::Gateway20`) +- **User agent** — Update SDK user agent string to indicate gateway 2.0 capability +- **Container connection** — Ensure EPK is computed and available for gateway 2.0 header injection in the driver layer + +#### Auto-Detection Flow + +#### Auto-Detection Flow + +When account metadata includes `thinClientReadableLocations`, gateway 2.0 is enabled automatically (internal). `CosmosEndpoint` gets `gateway20_url` and `resolve_endpoint()` prefers Gateway 2.0. No user configuration needed — transparent to the caller. + +#### Files Changed + +``` +EDIT src/driver_bridge.rs — Ensure internal config passes through +EDIT src/handler/container_connection.rs — Ensure EPK available for driver +EDIT src/constants.rs — Any new header constants +``` + +--- + +### Phase 6: Testing + +Testing covers all layers from unit to E2E, matching or exceeding Java/.NET test coverage. + +#### Live Tests Pipeline + +A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway 2.0 requires a Cosmos DB account with thin client endpoints enabled, which is separate from the standard emulator and live test infrastructure. + +**Trigger:** PR changes to `sdk/cosmos/**` + manual dispatch + +**Provision:** + +- Use a **dedicated, pre-provisioned Cosmos DB account** with gateway 2.0 / thin client endpoints enabled (hardcoded for this pipeline, reused across runs) +- Account credentials stored in pipeline secrets (e.g., `AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`) +- Multi-region configuration (at least 2 regions) +- Verify `thinClientReadableLocations` in account metadata at pipeline start + +**Test Matrix:** + +- Single-region gateway 2.0 +- Multi-region gateway 2.0 with failover +- Gateway 2.0 + standard gateway fallback + +**Test Suites:** + +- Point CRUD (create, read, replace, upsert, patch, delete) +- Query (single-partition, cross-partition) +- Batch operations +- Change feed (LatestVersion) +- Retry scenarios (408, 410, 503) +- Diagnostics validation (`TransportKind::Gateway20`) + +**Artifacts:** Test results (JUnit XML), diagnostics logs, perf metrics (RU, latency) + +#### Pipeline Files + +| NEW | `sdk/cosmos/ci-gateway20.yml` | Gateway 2.0 live tests pipeline definition (uses pre-provisioned account) | +| EDIT | `sdk/cosmos/live-platform-matrix.json` | Add gateway 2.0 test matrix entry | + +#### Test Coverage Matrix + +| Test Category | Unit | Integration | E2E | Scenarios | +| --- | --- | --- | --- | --- | +| RNTBD serialization | Yes | | | Round-trip, edge cases, malformed input | +| EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2 | +| Operation filtering | Yes | | | All ResourceType x OperationType combos | +| Header injection | Yes | | | Point vs feed EPK headers, proxy type headers | +| Gateway 2.0 transport | Yes | Yes | | Correct HTTP/2 config, sharded pool selection | +| Point CRUD | | | Yes | Create, read, replace, upsert, patch, delete | +| Query | | | Yes | SQL query, cross-partition | +| Batch | | | Yes | Transactional batch ops | +| Change feed | | | Yes | LatestVersion, incremental | +| Retry: 408 timeout | | Yes | | Cross-region for reads, local-only for writes | +| Retry: 503 | | Yes | | Regional failover | +| Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific) | +| Gateway 2.0 fallback | | Yes | | Proxy down -> standard gateway | +| Multi-region failover | | Yes | Yes | Preferred regions, failover | +| Fault injection | | Yes | | Timeout, 503, network error | +| Perf benchmarks | | | Yes | Already wired in perf crate | +| Diagnostics validation | Yes | Yes | | TransportKind::Gateway20 in diagnostics output | + +#### Files Changed + +| Action | File | Purpose | +| --- | --- | --- | +| NEW | `tests/gateway20_rntbd_tests.rs` | RNTBD unit tests (driver) | +| NEW | `tests/gateway20_pipeline_tests.rs` | Header injection + operation filtering (driver) | +| NEW | `tests/emulator_tests/gateway20_e2e.rs` | E2E tests (SDK, requires emulator) | +| EDIT | `tests/emulator_tests/cosmos_fault_injection.rs` | Add gateway 2.0 fault scenarios | +| EDIT | `azure_data_cosmos_perf/src/runner.rs` | Perf config already wired | + +--- + +### Phase Dependency Graph + +- **Phase 1** (RNTBD Protocol) and **Phase 3** (Endpoint Discovery) can proceed in parallel +- **Phase 2** (Request Pipeline) depends on Phase 1 and Phase 3 +- **Phase 4** (Retry/Errors) and **Phase 5** (SDK Integration) depend on Phase 2 +- **Phase 6** (Testing) depends on all previous phases + +--- + +## 5. Open Questions + +1. **HTTP/2 prior knowledge vs ALPN**: Rust already configures gateway 2.0 as HTTP/2 with prior knowledge — confirm this matches service expectations. +2. **Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. **Decision**: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). The account name and credentials should be stored in pipeline secrets, with the pipeline reading the endpoint from environment variables. From fa33adf431001e42354dd259d87abf2898251043 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 13:18:18 -0700 Subject: [PATCH 02/48] Cosmos: address PR deep-review findings on Gateway 2.0 spec Addresses all 21 findings from the PR Deep Reviewer run on #4223: - Fix EPK attribution: canonical path is the driver's EffectivePartitionKey::compute/compute_range. Flag the SDK-side get_hashed_partition_key_string as known-broken for MultiHash and must not be wired to Gateway 2.0 header injection. - Resolve stored-procedure contradiction: follow Java (no SP via thin client); explicitly reject .NET's ExecuteJavaScript allowance. - Keep RNTBD field widths as uint16 LE, cite Java RntbdRequestFrame.encode's writeShortLE. Reviewer's uint32 claim was incorrect for thin client. - Add single-source-of-truth gating model (section 3.4) with invariants. - Add fallback taxonomy (eligibility vs failure), read/write region pairing, PLF precedence, env var reframed as unsupported override. - Standardize on is_operation_supported_by_gateway20(). - Add Status/Date/Authors header, numbered TOC, Q1 through Q4 open questions, Related Specs cross-links, header-name wire-to-constant table, range-header wire format notes, wire-format ambiguity resolutions (length-inclusive, UUID byte order, payload-presence). - Expand test matrix with error-case EPK, StoredProc-rejected assertion, Bulk vs Batch, failure-fallback unwind, PLF precedence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 307 ++++++++++++------ 1 file changed, 208 insertions(+), 99 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 982c92f2031..e69486863a7 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,10 +1,32 @@ # Gateway 2.0 Design Spec for Rust Driver & SDK +**Status**: Draft / Iterating +**Date**: 2026-04-20 +**Authors**: (team) + +--- + +## Table of Contents + +1. [Overview](#1-overview) +2. [Motivation](#2-motivation) +3. [Current Rust State](#3-current-rust-state) +4. [Rust Implementation Plan](#4-rust-implementation-plan) +5. [Open Questions](#5-open-questions) + +### Related Specs + +- [`TRANSPORT_PIPELINE_SPEC.md`](./TRANSPORT_PIPELINE_SPEC.md) — sharded HTTP/2 transport, timeout regime, hedging, `(HttpClient, host:port)` shard key. Gateway 2.0 reuses the sharded transport defined there verbatim; this spec does **not** introduce a new timeout or hedging policy. +- [`PARTITION_KEY_RANGE_CACHE_SPEC.md`](./PARTITION_KEY_RANGE_CACHE_SPEC.md) — PKRange cache semantics and `EffectivePartitionKey` usage; cited by Phase 2 for EPK computation and by Phase 4 for 410 handling. +- [`PARTITION_LEVEL_FAILOVER_SPEC.md`](./PARTITION_LEVEL_FAILOVER_SPEC.md) — per-partition region override semantics; cited by Phase 4 for PLF precedence over Gateway 2.0 routing. + +--- + ## 1. Overview Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clients to route data-plane operations through a lightweight proxy endpoint instead of directly to backend replicas. It uses RNTBD binary protocol over HTTP/2, with the proxy handling partition routing, replica selection, and load balancing. -**Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Avoid "thin client" except when referencing Java/.NET code. +**Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Avoid "thin client" except when referencing Java/.NET code or existing constants (`THINCLIENT_*`). --- @@ -60,56 +82,87 @@ Gateway 2.0 moves partition-level routing intelligence from the SDK into the ser ## 3. Current Rust State -The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 scaffolding: +The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 scaffolding. + +### 3.1 Already Implemented — endpoint & transport -### Already Implemented - **`CosmosEndpoint`** — `gateway20_url: Option` field, `regional_with_gateway20()`, `uses_gateway20()`, `selected_url()` methods - **`TransportMode::Gateway20`** enum variant in pipeline components - **`RoutingDecision`** — carries `transport_mode` that distinguishes gateway vs gateway 2.0 -- **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config, env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` +- **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config (see §3.4 for gating model) - **`CosmosTransport`** — `dataplane_gateway20_transport: OnceLock`, lazy init with `AdaptiveTransport::gateway20()` - **`AdaptiveTransport::ShardedGateway20`** variant — always HTTP/2 with prior knowledge -- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config +- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config (see Open Question Q1 about prior-knowledge vs ALPN) - **`TransportKind::Gateway20`** in diagnostics + +### 3.2 Already Implemented — account metadata & routing + - **`LocationStateStore`** — `gateway20_enabled` flag, passes through to endpoint construction -- **Routing systems** — `build_account_endpoint_state()` resolves gateway 2.0 URLs from account properties -- **`resolve_endpoint()`** in operation pipeline — selects gateway 2.0 URL when `prefer_gateway20` is true -- **Constants** — `THINCLIENT_PROXY_OPERATION_TYPE`, `THINCLIENT_PROXY_RESOURCE_TYPE`, `START_EPK`, `END_EPK` headers defined -- **EPK computation** — `get_hashed_partition_key_string()` already computes EPK in `container_connection.rs` +- **`AccountProperties::has_thin_client_endpoints()`** (`account_metadata_cache.rs:191`) — detection helper +- **`AccountProperties::thin_client_writable_regions()` / `thin_client_readable_regions()`** (`account_metadata_cache.rs:197,205`) — region accessors +- **`parse_thin_client_locations()`** — parser for `thinClient(Readable|Writable)Locations` +- **`build_account_endpoint_state()`** (`routing_systems.rs`) — resolves gateway 2.0 URLs from account properties +- **Existing tests** in `routing_systems.rs:218–289` already exercise GW20 endpoint construction with both readable and writable thin-client locations +- **`resolve_endpoint()`** in operation pipeline — selects gateway 2.0 URL when `prefer_gateway20` is true (see §3.4) + +### 3.3 Already Implemented — EPK & constants + +- **`EffectivePartitionKey::compute()` / `::compute_range()`** — in `azure_data_cosmos_driver::models::effective_partition_key` (MultiHash-aware, hierarchical-PK correct). This is the canonical path and is what Gateway 2.0 header injection MUST call. Both functions return `azure_core::Result` (per PR #4087 review: MultiHash-requires-V2 and component-count checks are runtime errors, not `debug_assert`s, so a user gets `Err` rather than a panic on malformed input). +- **Constants** (in `azure_data_cosmos::constants`): `THINCLIENT_PROXY_OPERATION_TYPE` → `x-ms-thinclient-proxy-operation-type`, `THINCLIENT_PROXY_RESOURCE_TYPE` → `x-ms-thinclient-proxy-resource-type`, `START_EPK` → `x-ms-start-epk`, `END_EPK` → `x-ms-end-epk`. Phase 2 reuses these verbatim (see §Phase 2 "Header naming" for mapping). - **Perf crate** — `gateway20_allowed` config wiring -### Not Yet Implemented (Gaps) +### 3.4 Gating model (single source of truth) + +Two independent guards exist today (`is_gateway20_allowed` is checked in both routing and `get_dataplane_transport`). Per PR #3942 review, **routing is the single source of truth**; the transport-layer guard is intentional defense-in-depth and is technically dead code given current callers. + +Invariants this spec locks in: + +- `prefer_gateway20` is computed **once per request** during `resolve_endpoint` from: + `connection_pool().is_gateway20_allowed() && account.has_thin_client_endpoints()` +- After `resolve_endpoint`, downstream stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. +- `ConnectionPoolOptions.is_gateway20_allowed` and its env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` are an **unsupported, undocumented kill switch** reserved for emergency fallback. They are NOT exposed on `CosmosClientOptions` and may be removed without notice. + +### 3.5 Known broken / do-not-use + +- **`azure_data_cosmos::hash::get_hashed_partition_key_string()`** (called from `container_connection.rs:87`) — a legacy SDK-side function that is **a known-broken stub for MultiHash (hierarchical-PK) containers**. PR #4087's description explicitly calls it out as awaiting the SDK-to-driver cutover. **Do NOT** wire Phase 2 header injection to this function; use `EffectivePartitionKey::compute()` / `::compute_range()` (§3.3). + +### 3.6 Not Yet Implemented (Gaps) + 1. **RNTBD serialization/deserialization** — No binary protocol encoding/decoding exists -2. **Gateway 2.0 header injection** — Thin client proxy headers (`x-ms-thinclient-proxy-operation-type`, `x-ms-thinclient-proxy-resource-type`, EPK range headers) are not applied to requests +2. **Gateway 2.0 header injection** — Thin client proxy headers and EPK range headers are not applied to requests on the Gateway 2.0 path 3. **Supported operation filtering** — No `IsOperationSupportedByThinClient()` equivalent -4. **Gateway 2.0 endpoint discovery** — Verify account metadata parsing of `thinClientReadableLocations`/`thinClientWritableLocations` -5. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) -6. **Gateway 2.0 specific retry logic** — Fallback from gateway 2.0 to standard gateway on specific errors -7. **Integration/E2E tests** — No gateway 2.0 test coverage -8. **Fault injection** — No gateway 2.0 fault injection scenarios +4. **`x-ms-cosmos-use-thinclient` header** on account metadata requests (to trigger thin-client endpoint advertisement) +5. **SDK-to-driver cutover for EPK** — SDK call sites (`feed_range_from_partition_key`, `container_connection.rs:87`) still call the broken SDK hash; they must route through the driver's `EffectivePartitionKey::compute()` +6. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) +7. **Gateway 2.0 specific fallback** — Failure-driven fallback from Gateway 2.0 to standard gateway (see Phase 4) +8. **Integration/E2E tests** — No gateway 2.0 test coverage beyond the routing-systems unit tests +9. **Fault injection** — No gateway 2.0 fault injection scenarios +10. **Constants cross-crate visibility** — `THINCLIENT_PROXY_*` and `START_EPK` / `END_EPK` currently live in `azure_data_cosmos::constants` but Phase 2 injects headers from the driver crate. Options (to decide in Phase 2): (a) move constants to `azure_data_cosmos_driver::constants` and re-export from SDK, (b) re-export SDK constants through a driver-side `pub use`, or (c) duplicate. Recommend (a). --- ## 4. Rust Implementation Plan -### Current Request Flow (Gateway 1.0) +### 4.1 Current Request Flow (Gateway 1.0) 1. `CosmosClient::create_item(T)` calls `ContainerClient` -2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK, resolves PKRange +2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK (via the broken SDK hash today — see §3.5), resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) 4. `resolve_endpoint()` selects a gateway endpoint 5. Transport Pipeline applies cosmos headers, signs request 6. HTTP/REST request sent to Cosmos Gateway (shared proxy, no SLA) -### Target Request Flow (Gateway 2.0) +### 4.2 Target Request Flow (Gateway 2.0) 1. `CosmosClient::create_item(T)` calls `ContainerClient` -2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK, resolves PKRange +2. `container_connection.rs` serializes `T` to `&[u8]`; EPK computation is deferred to the driver (via `EffectivePartitionKey::compute()` / `::compute_range()`), which then resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) -4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if available from account metadata) -5. Transport Pipeline checks `is_supported_by_gw20()?`: - - **YES**: Inject gateway 2.0 headers + RNTBD serialize -> HTTP/2 POST to Gateway 2.0 Proxy (SLA) - - **NO**: Standard HTTP/REST request to Cosmos Gateway (fallback) +4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if `prefer_gateway20` per §3.4) +5. Transport Pipeline checks `is_operation_supported_by_gateway20()`: + - **YES**: Inject gateway 2.0 headers + RNTBD serialize → HTTP/2 POST to Gateway 2.0 Proxy (SLA) + - **NO**: Standard HTTP/REST request to Cosmos Gateway (eligibility fallback — per-request, deterministic) + +> **Naming**: The function is `is_operation_supported_by_gateway20()` throughout. Older drafts used `is_supported_by_gw20()` — do not reintroduce the abbreviation. --- @@ -128,31 +181,40 @@ The RNTBD (Reliable Network Transfer Binary Data) protocol is the wire format us - **`rntbd/tokens.rs`** — RNTBD token types (type IDs, lengths, value encodings) used in metadata sections - **`rntbd/status.rs`** — RNTBD status code mapping to `CosmosStatus` +#### Versioning + +Thin client RNTBD has no version negotiation on the wire. The proxy advertises a single supported frame format per endpoint and rejects mismatched frames at the HTTP layer (the HTTP/2 request fails rather than triggering an RNTBD version-mismatch error). Direct-mode RNTBD has version negotiation (`CURRENT_PROTOCOL_VERSION = 0x00000001`); **do not** apply that pattern here. + #### RNTBD Request Wire Format -There is no version negotiation for thin client RNTBD. The frame format is fixed (derived from Java `RntbdRequestFrame`): +The frame layout is derived from Java `com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestFrame.encode(...)`, which writes: -| Offset | Size | Field | Encoding | -| --- | --- | --- | --- | -| 0 | 4 | Total message length | uint32 LE (frame + metadata + payload) | -| 4 | 2 | Resource type | uint16 LE | -| 6 | 2 | Operation type | uint16 LE | -| 8 | 16 | Activity ID | UUID (two uint64 LE) | -| 24 | var | Metadata tokens | Token stream (filtered by `thinClientProxyExcludedSet`) | -| 24+N | 4 | Payload length | uint32 LE (only if payload present) | -| 28+N | var | Payload body | Raw bytes (JSON or Cosmos binary) | +```java +out.writeIntLE(totalLength); +out.writeShortLE(resourceType.id()); +out.writeShortLE(operationType.id()); +RntbdUUID.encode(activityId, out); // two longs +``` -Frame header is 24 bytes. The `encode(byteBuf, forThinClient=true)` flag in Java excludes certain headers that the proxy does not need. +| Offset | Size | Field | Encoding | Notes | +| --- | --- | --- | --- | --- | +| 0 | 4 | Total message length | uint32 LE | **Inclusive** of the 4 length bytes themselves (matches Java `writeIntLE` semantics). | +| 4 | 2 | Resource type | uint16 LE | `writeShortLE(resourceType.id())` — narrower than direct-mode RNTBD's uint32 because thin-client IDs fit in 16 bits. | +| 6 | 2 | Operation type | uint16 LE | `writeShortLE(operationType.id())` — same rationale. | +| 8 | 16 | Activity ID | UUID, two uint64 LE | Java writes `(mostSignificantBits, leastSignificantBits)` as two little-endian `long`s — **this is not RFC 4122 byte order**. Example: UUID `12345678-1234-5678-1234-567812345678` → bytes `78 56 34 12 34 12 78 56` (MSB LE) then `78 56 34 12 78 56 34 12` (LSB LE). | +| 24 | var | Metadata tokens | Token stream | Filtered by `thinClientProxyExcludedSet` (see §Phase 2 header naming). | +| 24+N | 4 | Payload length | uint32 LE | **Only present when the operation type implies a payload** (writes, patch, query body, stored-proc args, batch). Absence is signaled by operation-type convention, not a flag bit. Parsers must consult the operation-type → has-payload table derived from Java's `RntbdRequestArgs`. | +| 28+N | var | Payload body | Raw bytes | JSON or Cosmos binary, per resource type. | #### RNTBD Response Wire Format -| Offset | Size | Field | Encoding | -| --- | --- | --- | --- | -| 0 | 4 | Total message length | uint32 LE | -| 4 | 4 | Status code | uint32 LE | -| 8 | 16 | Activity ID | UUID (two uint64 LE) | -| 24 | var | Metadata tokens | Token stream (request charge, session token, continuation, etc.) | -| 24+N | var | Body payload | Raw bytes (optional) | +| Offset | Size | Field | Encoding | Notes | +| --- | --- | --- | --- | --- | +| 0 | 4 | Total message length | uint32 LE | Inclusive of the 4 length bytes (same convention as request). | +| 4 | 4 | Status code | uint32 LE | Maps to HTTP status + `CosmosStatus`. | +| 8 | 16 | Activity ID | UUID, two uint64 LE | Same MSB-LE / LSB-LE pairing as request. | +| 24 | var | Metadata tokens | Token stream | Request charge, session token, continuation, etc. | +| 24+N | var | Body payload | Raw bytes | Optional; presence determined by total-length arithmetic (`total_length - header_and_tokens_len > 0`). | #### Files Changed @@ -175,15 +237,14 @@ This phase wires RNTBD serialization into the existing transport pipeline and ad #### What Will Be Done -- **Operation filtering** — New function `is_operation_supported_by_gateway20(resource_type, operation_type) → bool` to match .NET's `IsOperationSupportedByThinClient()`. Only data-plane document operations + stored procedure Execute are eligible. -- **Header injection** — When `TransportMode::Gateway20`, inject thin client headers before sending: - - `x-ms-thinclient-proxy-operation-type` — numeric operation type - - `x-ms-thinclient-proxy-resource-type` — numeric resource type - - `x-ms-effective-partition-key` — EPK hash for point operations - - `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` — EPK range for feed operations -- **Request body wrapping** — Serialize the entire request (headers + body) into RNTBD binary format and POST as the HTTP/2 body -- **Response unwrapping** — Deserialize the RNTBD response body back into `CosmosResponseHeaders` + raw document bytes -- **Fallback** — If operation is not supported by gateway 2.0, transparently route through standard gateway +- **Operation filtering** — `is_operation_supported_by_gateway20(resource_type, operation_type) → bool`. Following Java (`ThinClientStoreModel`), only `ResourceType::Document` operations are eligible. The .NET position (`IsOperationSupportedByThinClient` additionally allows `StoredProcedure::ExecuteJavaScript`) is **intentionally not adopted**. +- **EPK computation** — Call `EffectivePartitionKey::compute()` (point) or `::compute_range()` (feed/cross-partition) from the driver layer. Do **not** call `azure_data_cosmos::hash::get_hashed_partition_key_string` (§3.5). SDK call sites that currently use it must route through the driver's implementation as part of this phase. +- **EPK error propagation** — If EPK computation returns `Err` (MultiHash-requires-V2, component-count mismatch, etc.), surface as `CosmosStatus::BadRequest` to the caller. **Do not** fall back to standard gateway — the same inputs would be equally broken there. +- **Header injection** — When `transport_mode == Gateway20`, inject the thin-client headers listed below. +- **Request body wrapping** — Serialize the entire request (headers + body) into RNTBD binary format and POST as the HTTP/2 body. +- **Response unwrapping** — Deserialize the RNTBD response body back into `CosmosResponseHeaders` + raw document bytes. +- **Eligibility fallback** — Operation ineligible for Gateway 2.0 → route through standard gateway for this single request (per-request, deterministic). See §Phase 4 for the distinct failure-driven fallback. +- **Constants placement** — Resolve the cross-crate constants question from §3.6-10 (recommend: move `THINCLIENT_PROXY_*` and `START_EPK` / `END_EPK` to a driver-side module, re-export from SDK). #### Supported Operations @@ -200,22 +261,44 @@ Only `ResourceType::Document` is eligible for gateway 2.0 (following Java's appr | Query | Yes | | | QueryPlan | Yes | | | ReadFeed | Yes | LatestVersion change feed only; excludes AllVersionsAndDeletes | -| Batch | Yes | Includes bulk operations | -| StoredProcedure Execute | **No** | Following Java; only `ResourceType::Document` is eligible | +| Batch | Yes | Transactional same-PK batch (single resource, single request). | +| Bulk | Yes | SDK-side fan-out of independent CRUD ops; each fan-out leg is a separate eligible Document op. Distinct from Batch. | +| StoredProcedure Execute | **No** | Following Java; Rust does **not** follow .NET's `ExecuteJavaScript` allowance. | | All other resource types | **No** | Metadata operations use standard gateway | +#### Header naming (proxy headers, in HTTP/2 request headers — not RNTBD tokens) + +These are wire-level HTTP/2 request headers on the outer POST to the proxy. They are **not** inside the RNTBD metadata token stream. + +| Header (wire) | Rust constant (crate) | Semantics | When emitted | +| --- | --- | --- | --- | +| `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (SDK today; move to driver per §3.6-10) | Numeric operation type | Every Gateway 2.0 request | +| `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (SDK today; move) | Numeric resource type | Every Gateway 2.0 request | +| `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | +| `x-ms-thinclient-range-min` | **Reuse** `START_EPK` (= `x-ms-start-epk`) — confirm header name with service, or add new constant if the proxy requires `x-ms-thinclient-range-min` literally | Lower bound of EPK range | Feed / cross-partition ops only | +| `x-ms-thinclient-range-max` | **Reuse** `END_EPK` (= `x-ms-end-epk`) — same caveat | Upper bound of EPK range | Feed / cross-partition ops only | +| `x-ms-cosmos-use-thinclient` | **NEW** (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | + +**Action item for Phase 2**: confirm with the service team whether the proxy expects `x-ms-start-epk` / `x-ms-end-epk` (existing constants) or `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (Java naming). If the latter, introduce new constants and retire the former on the Gateway 2.0 path. + +#### Range header wire format + +EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `epk_length_aware_cmp` (in `container_routing_map.rs`, introduced by PR #4087) which correctly handles the mixed-length boundaries returned by the backend. `@analogrelay`'s earlier zero-padding proposal was **not** adopted; stay consistent with the length-aware convention. + +> **`Range` semantics footgun** (from PR #4087): `compute_range` returns a Rust `std::ops::Range` where `start == end` denotes a **point operation**. Standard `Range` iteration treats that as empty, so code that uses `.contains()` or iterates the range directly will misbehave. Always treat `start == end` as the point case explicitly. + #### Gateway 2.0 Header Injection Flow When `transport_mode == Gateway20`: 1. Set `x-ms-thinclient-proxy-operation-type` (numeric operation type) 2. Set `x-ms-thinclient-proxy-resource-type` (numeric resource type) -3. Point operation? Set `x-ms-effective-partition-key` (EPK hash) - Feed operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` -4. Serialize headers + body into RNTBD binary format +3. Point operation? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`) + Feed operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` (from `EffectivePartitionKey::compute_range()`) +4. Serialize headers + body into RNTBD binary format (Phase 1) 5. POST RNTBD body to gateway 2.0 endpoint via HTTP/2 -When `transport_mode != Gateway20`: Standard HTTP/REST request (existing flow, unchanged) +When `transport_mode != Gateway20`: Standard HTTP/REST request (existing flow, unchanged). #### Files Changed @@ -225,22 +308,29 @@ EDIT src/driver/transport/transport_pipeline.rs — Branch on TransportMode in EDIT src/driver/transport/cosmos_headers.rs — Add gateway 2.0 header application EDIT src/driver/transport/mod.rs — Add is_operation_supported_by_gateway20() EDIT src/driver/pipeline/components.rs — Add EPK fields to TransportRequest if needed +EDIT src/driver/constants.rs (or NEW) — Relocate THINCLIENT_PROXY_* constants per §3.6-10 +EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashed_partition_key_string callers with driver's EffectivePartitionKey::compute() ``` --- -### Phase 3: Endpoint Discovery (Driver Layer) +### Phase 3: Endpoint Discovery — verification & one new header **Crate**: `azure_data_cosmos_driver` -This phase is largely already implemented. The remaining work is confirming the endpoint URL pattern and ensuring the account metadata cache properly resolves gateway 2.0 URLs. +> Most of Phase 3 is **audit / verification** against scaffolding already in place (§3.1, §3.2). Only the `x-ms-cosmos-use-thinclient` request header is net-new code. Noted here because the dependency graph lists Phase 3 as a prerequisite for Phase 2; in practice the verification items can happen in parallel with Phase 1 and the one real code change can ride with Phase 2 if convenient. #### What Will Be Done -- **Verify** account metadata cache already parses `thin_client_readable_locations` / `thin_client_writable_locations` into gateway 2.0 URLs -- **Confirm** `build_account_endpoint_state()` in routing_systems.rs correctly constructs `CosmosEndpoint::regional_with_gateway20()` -- **Test** endpoint discovery with live account that has gateway 2.0 enabled -- **Add** `x-ms-cosmos-use-thinclient` header to account metadata requests to trigger gateway 2.0 endpoint advertisement +- **Verify** account metadata cache parses `thinClientReadableLocations` / `thinClientWritableLocations` into `CosmosEndpoint::gateway20_url` (existing, per §3.2) +- **Confirm** `build_account_endpoint_state()` constructs `CosmosEndpoint::regional_with_gateway20()` correctly in multi-region accounts (existing tests at `routing_systems.rs:218–289` already cover this) +- **Verify** `AccountProperties::has_thin_client_endpoints()` is used as the gating signal per §3.4 +- **Add** `x-ms-cosmos-use-thinclient` request header on account metadata fetches (new code) +- **Test** endpoint discovery with live account that has gateway 2.0 enabled (handled by Phase 6 live pipeline) + +#### Region pairing (lock in the §PR #3942 decision) + +Thin-client read locations pair **only** with read regions; thin-client write locations pair **only** with write regions. A write region that advertises no thin-client URL falls back to standard gateway **for writes** (this was deliberate in PR #3942: session retries that reroute reads to write endpoints would otherwise cross the read/write thin-client split). This is a correctness invariant — do not "fix" it by cross-pairing. #### Endpoint Discovery Flow (Existing) @@ -256,9 +346,9 @@ Account metadata response includes: #### Files Changed ``` -EDIT src/driver/cache/account_metadata_cache.rs — Verify/fix thin client endpoint parsing -EDIT src/driver/transport/cosmos_headers.rs — Add x-ms-cosmos-use-thinclient header -TEST src/driver/routing/routing_systems.rs — Add tests for gateway 2.0 endpoint construction +EDIT src/driver/cache/account_metadata_cache.rs — Verify thin client endpoint parsing (audit only) +EDIT src/driver/transport/cosmos_headers.rs — Add x-ms-cosmos-use-thinclient header (NEW) +TEST src/driver/routing/routing_systems.rs — Add tests for read/write pairing edge cases ``` --- @@ -267,20 +357,32 @@ TEST src/driver/routing/routing_systems.rs — Add tests for gateway 2.0 **Crate**: `azure_data_cosmos_driver` -Retry policies are identical between thin client and standard gateway modes in both Java and .NET — only endpoint selection and request encoding differ. The existing retry pipeline should work as-is for most cases. +Retry policies are identical between Gateway 2.0 and standard gateway modes in both Java and .NET — only endpoint selection and request encoding differ. The existing retry pipeline should work as-is for most cases. #### What Will Be Done -- **Timeout policy** — Gateway 2.0 requests may use different timeout values than standard gateway -- **Read timeout cross-region retry** — On HTTP 408 with `GATEWAY_ENDPOINT_READ_TIMEOUT` sub-status, retry read operations in the next preferred region -- **Service unavailable (503)** — Mark endpoint unavailable for partition key range, then retry. Follow Java's conservative approach: only retry server-returned 503 or SDK-generated 503 with `SERVER_GENERATED_410` sub-status +- **Timeout policy** — Gateway 2.0 requests use the timeout regime defined in `TRANSPORT_PIPELINE_SPEC.md` (single timeout, not bifurcated). Do not introduce Gateway-2.0-specific timeouts. +- **Read timeout cross-region retry** — On HTTP 408 with `GATEWAY_ENDPOINT_READ_TIMEOUT` sub-status, retry read operations in the next preferred region. +- **Service unavailable (503)** — Mark endpoint unavailable for partition key range, then retry. Follow Java's conservative approach: only retry server-returned 503 or SDK-generated 503 with `SERVER_GENERATED_410` sub-status. - **Gone (410)** — Action depends on sub-status code: - `PARTITION_KEY_RANGE_GONE` (1002): Refresh PKRange cache, retry - `COMPLETING_SPLIT_OR_MERGE` (1007): Refresh PKRange cache, retry - `COMPLETING_PARTITION_MIGRATION` (1008): Refresh PKRange cache, retry - `NAME_CACHE_IS_STALE` (1000): Refresh **collection** cache (NOT PKRange), retry - Other sub-statuses: Retry with backoff, no cache refresh -- **Gateway 2.0 fallback** — On persistent gateway 2.0 failures (e.g., proxy down), fall back to standard gateway for the remainder of the operation +- **Gateway 2.0 failure-driven fallback** — see "Fallback taxonomy" below. +- **Partition-Level Failover interaction** — when PLF (see `PARTITION_LEVEL_FAILOVER_SPEC.md`) selects a region whose `CosmosEndpoint` has no `gateway20_url`, **PLF wins**: the request falls back to standard gateway **for that partition** until PLF releases its override. PLF precedence prevents Gateway 2.0 from overriding an explicit per-partition region choice. + +#### Fallback taxonomy + +Two distinct fallback mechanisms — do not conflate them: + +| Name | Scope | Trigger | Duration | Unwind | +| --- | --- | --- | --- | --- | +| **Eligibility fallback** | Per-request | Operation is not eligible for Gateway 2.0 (fails `is_operation_supported_by_gateway20()`) | Single request only | N/A — recomputed every request | +| **Failure fallback** | Per-partition, sticky | N consecutive 503 `Proxy unreachable` or equivalent within a rolling window (target N=3, window=30s — confirm in implementation tuning) | Sticky until unwind | (a) next successful account-metadata refresh removes the affected `gateway20_url`, OR (b) a periodic probe to the proxy succeeds, OR (c) a fixed cooldown (target 60s) expires, whichever is first | + +Failure fallback is per-partition rather than per-client so that one bad proxy region does not degrade requests to other partitions. Client-lifetime stickiness is explicitly **not** used — it would prevent recovery within the process. #### Retry Decision Table @@ -296,17 +398,17 @@ Retry policies are identical between thin client and standard gateway modes in b | 410 Gone | other | Retry with backoff | | 429 Throttled | — | Existing throttle retry loop (unchanged) | | 449 Retry With | — | Retry same region (transient conflict) | -| 503 Unavailable | server-returned | Mark endpoint unavailable, failover | +| 503 Unavailable | server-returned | Mark endpoint unavailable, failover; increment failure-fallback counter | | 503 Unavailable | SDK-generated | Only retry if `SERVER_GENERATED_410` sub-status | -| Proxy unreachable | — | Fallback to `TransportMode::Gateway` (standard HTTP/REST) | +| Proxy unreachable | — | Increment failure-fallback counter; if threshold crossed, enter Failure fallback (§Fallback taxonomy) and route remainder through `TransportMode::Gateway` | #### Files Changed ``` -EDIT src/driver/pipeline/operation_pipeline.rs — Gateway 2.0 retry classification -EDIT src/driver/pipeline/components.rs — Add Gateway20Fallback variant if needed -EDIT src/driver/transport/transport_pipeline.rs — Timeout policy for gateway 2.0 -NEW src/driver/pipeline/gateway20_retry.rs — Gateway 2.0 specific retry logic (if needed) +EDIT src/driver/pipeline/operation_pipeline.rs — Gateway 2.0 retry classification + PLF precedence +EDIT src/driver/pipeline/components.rs — Add Gateway20FailureFallback state if needed +EDIT src/driver/transport/transport_pipeline.rs — Wire failure-fallback counter into transport +NEW src/driver/pipeline/gateway20_retry.rs — Gateway 2.0 failure-fallback state machine ``` --- @@ -319,24 +421,23 @@ Gateway 2.0 is **not exposed as a customer-facing configuration**. The SDK autom #### What Will Be Done -- **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations. No user opt-in required. -- **Internal config** — The existing `ConnectionPoolOptions.is_gateway20_allowed` remains internal-only (not exposed in `CosmosClientOptions`). It serves as a kill switch for testing or emergency fallback, not a user-facing setting. -- **Diagnostics** — `CosmosDiagnostics` should report when a request used gateway 2.0 vs standard gateway (already partially done via `TransportKind::Gateway20`) -- **User agent** — Update SDK user agent string to indicate gateway 2.0 capability -- **Container connection** — Ensure EPK is computed and available for gateway 2.0 header injection in the driver layer - -#### Auto-Detection Flow +- **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations (per §3.4). No user opt-in required. +- **Internal kill switch** — `ConnectionPoolOptions.is_gateway20_allowed` and its env var (§3.4) remain internal. They are NOT exposed in `CosmosClientOptions` and are unsupported/undocumented. +- **Diagnostics** — `CosmosDiagnostics` should report when a request used gateway 2.0 vs standard gateway (already partially done via `TransportKind::Gateway20`). +- **User agent** — Update SDK user agent string to indicate gateway 2.0 capability. +- **EPK cutover** — Replace SDK-side callers of `get_hashed_partition_key_string` with calls into the driver's `EffectivePartitionKey::compute()` / `::compute_range()` (this is the cutover PR #4087 flagged). Gateway 2.0 header injection depends on this being correct for hierarchical-PK containers. #### Auto-Detection Flow -When account metadata includes `thinClientReadableLocations`, gateway 2.0 is enabled automatically (internal). `CosmosEndpoint` gets `gateway20_url` and `resolve_endpoint()` prefers Gateway 2.0. No user configuration needed — transparent to the caller. +When account metadata includes `thinClientReadableLocations`, gateway 2.0 is enabled automatically (internal). `CosmosEndpoint` gets `gateway20_url` and `resolve_endpoint()` prefers Gateway 2.0 (per §3.4's single-source-of-truth rule). No user configuration needed — transparent to the caller. #### Files Changed ``` -EDIT src/driver_bridge.rs — Ensure internal config passes through -EDIT src/handler/container_connection.rs — Ensure EPK available for driver -EDIT src/constants.rs — Any new header constants +EDIT src/driver_bridge.rs — Ensure internal config passes through +EDIT src/handler/container_connection.rs — Route EPK through driver's EffectivePartitionKey::compute() +EDIT src/partition_key.rs — Update feed_range_from_partition_key call site +EDIT src/constants.rs — Relocate / re-export header constants per §3.6-10 ``` --- @@ -362,7 +463,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - Single-region gateway 2.0 - Multi-region gateway 2.0 with failover -- Gateway 2.0 + standard gateway fallback +- Gateway 2.0 + standard gateway fallback (both eligibility and failure-driven) **Test Suites:** @@ -377,6 +478,8 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway #### Pipeline Files +| Action | File | Purpose | +| --- | --- | --- | | NEW | `sdk/cosmos/ci-gateway20.yml` | Gateway 2.0 live tests pipeline definition (uses pre-provisioned account) | | EDIT | `sdk/cosmos/live-platform-matrix.json` | Add gateway 2.0 test matrix entry | @@ -385,18 +488,22 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Test Category | Unit | Integration | E2E | Scenarios | | --- | --- | --- | --- | --- | | RNTBD serialization | Yes | | | Round-trip, edge cases, malformed input | -| EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2 | -| Operation filtering | Yes | | | All ResourceType x OperationType combos | -| Header injection | Yes | | | Point vs feed EPK headers, proxy type headers | +| EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2, error cases (MultiHash V1, wrong component count) | +| Operation filtering | Yes | | | All ResourceType × OperationType combos; asserts StoredProc Execute is rejected | +| Header injection | Yes | | | Point vs feed EPK headers, proxy type headers, range-header un-padded form | | Gateway 2.0 transport | Yes | Yes | | Correct HTTP/2 config, sharded pool selection | +| Read/write pairing | Yes | | | Write region without thin-client falls back for writes only | | Point CRUD | | | Yes | Create, read, replace, upsert, patch, delete | | Query | | | Yes | SQL query, cross-partition | | Batch | | | Yes | Transactional batch ops | +| Bulk | | | Yes | Fan-out CRUD, distinct from Batch | | Change feed | | | Yes | LatestVersion, incremental | | Retry: 408 timeout | | Yes | | Cross-region for reads, local-only for writes | -| Retry: 503 | | Yes | | Regional failover | -| Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific) | -| Gateway 2.0 fallback | | Yes | | Proxy down -> standard gateway | +| Retry: 503 | | Yes | | Regional failover; failure-fallback trigger and unwind | +| Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | +| Eligibility fallback | | Yes | | StoredProc Execute → standard gateway | +| Failure fallback | | Yes | | Proxy down → sticky standard gateway; unwind via metadata refresh / cooldown | +| PLF precedence | | Yes | | Region without gw20_url + PLF override → standard gateway path | | Multi-region failover | | Yes | Yes | Preferred regions, failover | | Fault injection | | Yes | | Timeout, 503, network error | | Perf benchmarks | | | Yes | Already wired in perf crate | @@ -416,8 +523,8 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway ### Phase Dependency Graph -- **Phase 1** (RNTBD Protocol) and **Phase 3** (Endpoint Discovery) can proceed in parallel -- **Phase 2** (Request Pipeline) depends on Phase 1 and Phase 3 +- **Phase 1** (RNTBD Protocol) and the verification parts of **Phase 3** (Endpoint Discovery) can proceed in parallel +- **Phase 2** (Request Pipeline) depends on Phase 1, and folds in Phase 3's one new header (`x-ms-cosmos-use-thinclient`) - **Phase 4** (Retry/Errors) and **Phase 5** (SDK Integration) depend on Phase 2 - **Phase 6** (Testing) depends on all previous phases @@ -425,5 +532,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway ## 5. Open Questions -1. **HTTP/2 prior knowledge vs ALPN**: Rust already configures gateway 2.0 as HTTP/2 with prior knowledge — confirm this matches service expectations. -2. **Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. **Decision**: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). The account name and credentials should be stored in pipeline secrets, with the pipeline reading the endpoint from environment variables. +- **Q1 — HTTP/2 prior knowledge vs ALPN**: Rust already configures gateway 2.0 as HTTP/2 with prior knowledge. `TRANSPORT_PIPELINE_SPEC.md` settled ALPN as the default negotiation for the broader sharded transport. **Need service team confirmation** on which the Gateway 2.0 proxy expects. _Resolution_: pending. +- **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. +- **Q3 — EPK range header names**: Does the Gateway 2.0 proxy accept `x-ms-start-epk` / `x-ms-end-epk` (existing Rust constants) or require `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (Java naming)? _Resolution_: pending Phase 2 confirmation with service team. +- **Q4 — Failure-fallback thresholds**: Target values are N=3 consecutive 503s in a 30s window, 60s cooldown. _Resolution_: pending implementation tuning against live test pipeline data. From c732728fb4918fc93d3b451807e7ae418d079d8e Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 14:43:48 -0700 Subject: [PATCH 03/48] Cosmos: address second-pass review of Gateway 2.0 spec Addresses three findings from the PR Deep Reviewer second pass: - F-A: replace the non-existent epk_length_aware_cmp citation with EffectivePartitionKey's Ord/cmp impl, cite the actual epk_cmp_* tests in container_routing_map.rs and the binary_search_by consumer site. Point PR #4087 at the correct claim. - F-B: fix the numerically wrong UUID worked example. The previous example for 12345678-1234-5678-1234-567812345678 wrote MSB bytes 78 56 34 12 34 12 78 56, conflating writeLongLE with byte-reversal of the hyphen groups. Replace with 0a1b2c3d-4e5f-6789-abcd- ef0123456789 so MSB and LSB give visually distinct LE sequences. - F-C: add a "Proxy unreachable definition" subsection enumerating transport-level (TCP refuse/timeout, TLS handshake, HTTP/2 GOAWAY, reqwest::Error connect/timeout/request before any status) and HTTP-infrastructure classes (502, 504, 503-without-Cosmos- substatus). Explicitly exclude responses carrying a Cosmos sub-status. Defer to TRANSPORT_PIPELINE_SPEC for broader classification. Cross-reference from the Retry Decision Table. Also add a "Java parity" subsection to Phase 4 documenting that ThinClientStoreModel extends RxGatewayStoreModel, that none of the Java retry policies have thin-client-specific code, and that the Rust failure-fallback counter is more thin-client-aware than Java's. Flag a Java behavioral nuance worth NOT replicating: Java marks the gateway endpoint (not the thin-client endpoint) unavailable on a thin-client 503. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 37 +++++++++++++++++-- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index e69486863a7..e9d369b118c 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -201,7 +201,7 @@ RntbdUUID.encode(activityId, out); // two longs | 0 | 4 | Total message length | uint32 LE | **Inclusive** of the 4 length bytes themselves (matches Java `writeIntLE` semantics). | | 4 | 2 | Resource type | uint16 LE | `writeShortLE(resourceType.id())` — narrower than direct-mode RNTBD's uint32 because thin-client IDs fit in 16 bits. | | 6 | 2 | Operation type | uint16 LE | `writeShortLE(operationType.id())` — same rationale. | -| 8 | 16 | Activity ID | UUID, two uint64 LE | Java writes `(mostSignificantBits, leastSignificantBits)` as two little-endian `long`s — **this is not RFC 4122 byte order**. Example: UUID `12345678-1234-5678-1234-567812345678` → bytes `78 56 34 12 34 12 78 56` (MSB LE) then `78 56 34 12 78 56 34 12` (LSB LE). | +| 8 | 16 | Activity ID | UUID, two uint64 LE | Java writes `(mostSignificantBits, leastSignificantBits)` as two little-endian `long`s — **this is not RFC 4122 byte order**. Worked example for UUID `0a1b2c3d-4e5f-6789-abcd-ef0123456789`: `mostSignificantBits = 0x0a1b2c3d_4e5f_6789` → LE bytes `89 67 5f 4e 3d 2c 1b 0a`; `leastSignificantBits = 0xabcd_ef01_2345_6789` → LE bytes `89 67 45 23 01 ef cd ab`. The on-the-wire 16-byte sequence is the MSB bytes followed by the LSB bytes. | | 24 | var | Metadata tokens | Token stream | Filtered by `thinClientProxyExcludedSet` (see §Phase 2 header naming). | | 24+N | 4 | Payload length | uint32 LE | **Only present when the operation type implies a payload** (writes, patch, query body, stored-proc args, batch). Absence is signaled by operation-type convention, not a flag bit. Parsers must consult the operation-type → has-payload table derived from Java's `RntbdRequestArgs`. | | 28+N | var | Payload body | Raw bytes | JSON or Cosmos binary, per resource type. | @@ -283,7 +283,7 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They #### Range header wire format -EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `epk_length_aware_cmp` (in `container_routing_map.rs`, introduced by PR #4087) which correctly handles the mixed-length boundaries returned by the backend. `@analogrelay`'s earlier zero-padding proposal was **not** adopted; stay consistent with the length-aware convention. +EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `EffectivePartitionKey`'s `Ord` / `cmp` impl, which correctly handles the mixed-length boundaries returned by the backend; the `epk_cmp_*` tests in `container_routing_map.rs` (around L625–665) pin this behavior. The comparator is consumed via `binary_search_by(|r| r.min_inclusive.cmp(&epk_val))` (≈L282 of the same file). `@analogrelay`'s earlier zero-padding proposal in PR #4087 (commit `25233c903`) was **not** adopted; stay consistent with the length-aware convention. > **`Range` semantics footgun** (from PR #4087): `compute_range` returns a Rust `std::ops::Range` where `start == end` denotes a **point operation**. Standard `Range` iteration treats that as empty, so code that uses `.contains()` or iterates the range directly will misbehave. Always treat `start == end` as the point case explicitly. @@ -380,10 +380,39 @@ Two distinct fallback mechanisms — do not conflate them: | Name | Scope | Trigger | Duration | Unwind | | --- | --- | --- | --- | --- | | **Eligibility fallback** | Per-request | Operation is not eligible for Gateway 2.0 (fails `is_operation_supported_by_gateway20()`) | Single request only | N/A — recomputed every request | -| **Failure fallback** | Per-partition, sticky | N consecutive 503 `Proxy unreachable` or equivalent within a rolling window (target N=3, window=30s — confirm in implementation tuning) | Sticky until unwind | (a) next successful account-metadata refresh removes the affected `gateway20_url`, OR (b) a periodic probe to the proxy succeeds, OR (c) a fixed cooldown (target 60s) expires, whichever is first | +| **Failure fallback** | Per-partition, sticky | N consecutive *Proxy unreachable* events within a rolling window (target N=3, window=30s — confirm in implementation tuning). See "Proxy unreachable definition" below. | Sticky until unwind | (a) next successful account-metadata refresh removes the affected `gateway20_url`, OR (b) a periodic probe to the proxy succeeds, OR (c) a fixed cooldown (target 60s) expires, whichever is first | Failure fallback is per-partition rather than per-client so that one bad proxy region does not degrade requests to other partitions. Client-lifetime stickiness is explicitly **not** used — it would prevent recovery within the process. +##### Proxy unreachable definition + +For the purposes of the failure-fallback counter, a *Proxy unreachable* event is any of the following observed against a Gateway 2.0 endpoint, regardless of which retry layer surfaces it: + +1. **Transport-level failures** (no HTTP response received): + - TCP connect refused / timed out (`std::io::ErrorKind::ConnectionRefused`, `TimedOut`, `ConnectionReset`). + - TLS handshake failure (cert, ALPN, or protocol mismatch). + - HTTP/2 `GOAWAY` received before the request stream completes, or the underlying connection is dropped mid-request. + - `reqwest::Error` whose `is_connect()`, `is_timeout()`, or `is_request()` returns true *before any response status is observed*. +2. **HTTP infrastructure responses** (response received but the proxy itself is unhealthy or absent): + - HTTP `502 Bad Gateway`, `504 Gateway Timeout`. + - HTTP `503 Service Unavailable` **without** a Cosmos sub-status header (i.e., the response did not originate inside the proxy's Cosmos error path). + +The following are explicitly **not** *Proxy unreachable*: + +- Any response carrying a Cosmos sub-status header — those are server-routed errors and are handled by the regular Retry Decision Table. +- Application-level 4xx (those go through the normal cross-region retry policy and never trigger Failure fallback). +- HTTP `503` carrying a Cosmos sub-status (e.g., `SERVER_GENERATED_410`); see the "503 Unavailable | server-returned" row of the Retry Decision Table. + +Where transport-level error classification overlaps with the broader transport pipeline, `TRANSPORT_PIPELINE_SPEC.md` is authoritative; this section narrows the set to those classes that count toward the Gateway 2.0 failure-fallback counter. + +##### Java parity + +The Phase 4 retry approach above — reuse the existing retry policies and add only the failure-fallback counter on top — matches Java's posture. In `Azure/azure-sdk-for-java`, `ThinClientStoreModel extends RxGatewayStoreModel` and a grep for `thin.?client` across `ClientRetryPolicy.java`, `WebExceptionRetryPolicy.java`, `ResourceThrottleRetryPolicy.java`, `BackoffRetryUtility.java`, `DocumentClientRetryPolicy.java`, `MetadataRequestRetryPolicy.java`, `MetadataThrottlingRetryPolicy.java`, `PartitionKeyRangeGoneRetryPolicy.java`, `StaleResourceRetryPolicy.java`, and `WriteRetryPolicy.java` returns **zero** matches. `RxDocumentClientImpl.getRetryPolicyForPointOperation()` builds the same `ResetSessionTokenRetryPolicy → PartitionKeyMismatchRetryPolicy → StaleResourceRetryPolicy` chain regardless of whether the request lands on the gateway or thin-client store model — model selection happens after the retry policy is built. + +The Rust failure-fallback counter described above is **more thin-client-aware than Java's**, which has no equivalent counter. Java's only thin-client awareness in the retry-adjacent layer is at the `GlobalEndpointManager` level (parallel `hasThinClientReadLocations` set and `getThinclientRegionalEndpoint()` URL selection); fault injection is also not yet integrated for thin client (`GlobalEndpointManager.java:161` carries a `// TODO: integrate thin client into fault injection`). + +> **Java behavioral nuance to avoid replicating**: `ClientRetryPolicy.markEndpointUnavailableFor{Read,Write}` operates on `gatewayRegionalEndpoint`, meaning a thin-client 503 in Java marks the **gateway** endpoint unavailable rather than the thin-client endpoint. The Rust failure-fallback counter is per-partition and keyed off the `gateway20_url` itself, which is the more precise behavior; do not "align" with Java by widening the unavailability scope. + #### Retry Decision Table | Response | Sub-Status | Action | @@ -400,7 +429,7 @@ Failure fallback is per-partition rather than per-client so that one bad proxy r | 449 Retry With | — | Retry same region (transient conflict) | | 503 Unavailable | server-returned | Mark endpoint unavailable, failover; increment failure-fallback counter | | 503 Unavailable | SDK-generated | Only retry if `SERVER_GENERATED_410` sub-status | -| Proxy unreachable | — | Increment failure-fallback counter; if threshold crossed, enter Failure fallback (§Fallback taxonomy) and route remainder through `TransportMode::Gateway` | +| Proxy unreachable (see §"Proxy unreachable definition") | — | Increment failure-fallback counter; if threshold crossed, enter Failure fallback (§Fallback taxonomy) and route remainder through `TransportMode::Gateway` | #### Files Changed From a2be8902672f569d45a0aed11fb94bc8a64e0497 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 14:47:49 -0700 Subject: [PATCH 04/48] Cosmos: drop Java parity subsection from Gateway 2.0 spec Per follow-up review, remove the Phase 4 "Java parity" subsection. The cross-SDK observations belong in design discussion, not in the shipping design spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index e9d369b118c..f9abd369339 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -405,14 +405,6 @@ The following are explicitly **not** *Proxy unreachable*: Where transport-level error classification overlaps with the broader transport pipeline, `TRANSPORT_PIPELINE_SPEC.md` is authoritative; this section narrows the set to those classes that count toward the Gateway 2.0 failure-fallback counter. -##### Java parity - -The Phase 4 retry approach above — reuse the existing retry policies and add only the failure-fallback counter on top — matches Java's posture. In `Azure/azure-sdk-for-java`, `ThinClientStoreModel extends RxGatewayStoreModel` and a grep for `thin.?client` across `ClientRetryPolicy.java`, `WebExceptionRetryPolicy.java`, `ResourceThrottleRetryPolicy.java`, `BackoffRetryUtility.java`, `DocumentClientRetryPolicy.java`, `MetadataRequestRetryPolicy.java`, `MetadataThrottlingRetryPolicy.java`, `PartitionKeyRangeGoneRetryPolicy.java`, `StaleResourceRetryPolicy.java`, and `WriteRetryPolicy.java` returns **zero** matches. `RxDocumentClientImpl.getRetryPolicyForPointOperation()` builds the same `ResetSessionTokenRetryPolicy → PartitionKeyMismatchRetryPolicy → StaleResourceRetryPolicy` chain regardless of whether the request lands on the gateway or thin-client store model — model selection happens after the retry policy is built. - -The Rust failure-fallback counter described above is **more thin-client-aware than Java's**, which has no equivalent counter. Java's only thin-client awareness in the retry-adjacent layer is at the `GlobalEndpointManager` level (parallel `hasThinClientReadLocations` set and `getThinclientRegionalEndpoint()` URL selection); fault injection is also not yet integrated for thin client (`GlobalEndpointManager.java:161` carries a `// TODO: integrate thin client into fault injection`). - -> **Java behavioral nuance to avoid replicating**: `ClientRetryPolicy.markEndpointUnavailableFor{Read,Write}` operates on `gatewayRegionalEndpoint`, meaning a thin-client 503 in Java marks the **gateway** endpoint unavailable rather than the thin-client endpoint. The Rust failure-fallback counter is per-partition and keyed off the `gateway20_url` itself, which is the more precise behavior; do not "align" with Java by widening the unavailability scope. - #### Retry Decision Table | Response | Sub-Status | Action | From 39827c9d71c90c060cf41cf16a0249668b511a6e Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 14:52:27 -0700 Subject: [PATCH 05/48] Cosmos: resolve Q1/Q3 + clarify Q4 in Gateway 2.0 spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Q1 (HTTP/2 vs ALPN): resolved. Gateway 2.0 is HTTP/2-only with prior knowledge; the proxy does not accept HTTP/1.x. Negotiation failure feeds the failure-fallback counter rather than downgrading. - Q3 (EPK range header names): resolved. Proxy requires the Java names x-ms-thinclient-range-min / -max. Phase 2 introduces new THINCLIENT_RANGE_MIN / _MAX constants; START_EPK / END_EPK are not emitted on the Gateway 2.0 path. - Q4 (failure-fallback thresholds): clarified initial values (N=3 in 30s sliding window, 60s cooldown) and noted the live test pipeline is the tuning surface; thresholds are internal, not customer-tunable. Updates Phase 2 header naming table and §3.1 / §3.3 references accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index f9abd369339..b3c36180d3e 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -91,8 +91,8 @@ The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 - **`RoutingDecision`** — carries `transport_mode` that distinguishes gateway vs gateway 2.0 - **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config (see §3.4 for gating model) - **`CosmosTransport`** — `dataplane_gateway20_transport: OnceLock`, lazy init with `AdaptiveTransport::gateway20()` -- **`AdaptiveTransport::ShardedGateway20`** variant — always HTTP/2 with prior knowledge -- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config (see Open Question Q1 about prior-knowledge vs ALPN) +- **`AdaptiveTransport::ShardedGateway20`** variant — HTTP/2 only with prior knowledge (no HTTP/1.x fallback; the proxy does not accept HTTP/1.x — see Open Question Q1, resolved) +- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config; HTTP/2 negotiation failure surfaces as a transport error (which feeds the failure-fallback counter, §Phase 4) rather than downgrading - **`TransportKind::Gateway20`** in diagnostics ### 3.2 Already Implemented — account metadata & routing @@ -108,7 +108,7 @@ The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 ### 3.3 Already Implemented — EPK & constants - **`EffectivePartitionKey::compute()` / `::compute_range()`** — in `azure_data_cosmos_driver::models::effective_partition_key` (MultiHash-aware, hierarchical-PK correct). This is the canonical path and is what Gateway 2.0 header injection MUST call. Both functions return `azure_core::Result` (per PR #4087 review: MultiHash-requires-V2 and component-count checks are runtime errors, not `debug_assert`s, so a user gets `Err` rather than a panic on malformed input). -- **Constants** (in `azure_data_cosmos::constants`): `THINCLIENT_PROXY_OPERATION_TYPE` → `x-ms-thinclient-proxy-operation-type`, `THINCLIENT_PROXY_RESOURCE_TYPE` → `x-ms-thinclient-proxy-resource-type`, `START_EPK` → `x-ms-start-epk`, `END_EPK` → `x-ms-end-epk`. Phase 2 reuses these verbatim (see §Phase 2 "Header naming" for mapping). +- **Constants** (in `azure_data_cosmos::constants`): `THINCLIENT_PROXY_OPERATION_TYPE` → `x-ms-thinclient-proxy-operation-type`, `THINCLIENT_PROXY_RESOURCE_TYPE` → `x-ms-thinclient-proxy-resource-type`. Phase 2 reuses these verbatim. The existing `START_EPK` (= `x-ms-start-epk`) / `END_EPK` (= `x-ms-end-epk`) constants are **not** used on Gateway 2.0 requests; Phase 2 introduces new `THINCLIENT_RANGE_MIN` (= `x-ms-thinclient-range-min`) / `THINCLIENT_RANGE_MAX` (= `x-ms-thinclient-range-max`) constants per Q3 resolution. See §Phase 2 "Header naming" for mapping. - **Perf crate** — `gateway20_allowed` config wiring ### 3.4 Gating model (single source of truth) @@ -275,11 +275,11 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They | `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (SDK today; move to driver per §3.6-10) | Numeric operation type | Every Gateway 2.0 request | | `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (SDK today; move) | Numeric resource type | Every Gateway 2.0 request | | `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | -| `x-ms-thinclient-range-min` | **Reuse** `START_EPK` (= `x-ms-start-epk`) — confirm header name with service, or add new constant if the proxy requires `x-ms-thinclient-range-min` literally | Lower bound of EPK range | Feed / cross-partition ops only | -| `x-ms-thinclient-range-max` | **Reuse** `END_EPK` (= `x-ms-end-epk`) — same caveat | Upper bound of EPK range | Feed / cross-partition ops only | +| `x-ms-thinclient-range-min` | **NEW** — `THINCLIENT_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | +| `x-ms-thinclient-range-max` | **NEW** — `THINCLIENT_RANGE_MAX` (driver) | Upper bound of EPK range | Feed / cross-partition ops only | | `x-ms-cosmos-use-thinclient` | **NEW** (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | -**Action item for Phase 2**: confirm with the service team whether the proxy expects `x-ms-start-epk` / `x-ms-end-epk` (existing constants) or `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (Java naming). If the latter, introduce new constants and retire the former on the Gateway 2.0 path. +Per Q3 resolution, the Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (it does **not** accept `x-ms-start-epk` / `x-ms-end-epk`). Phase 2 introduces the new constants above; the existing `START_EPK` / `END_EPK` constants are not emitted on the Gateway 2.0 path. #### Range header wire format @@ -553,7 +553,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway ## 5. Open Questions -- **Q1 — HTTP/2 prior knowledge vs ALPN**: Rust already configures gateway 2.0 as HTTP/2 with prior knowledge. `TRANSPORT_PIPELINE_SPEC.md` settled ALPN as the default negotiation for the broader sharded transport. **Need service team confirmation** on which the Gateway 2.0 proxy expects. _Resolution_: pending. +- **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails (and trips the failure-fallback counter, §Phase 4) rather than downgrading. - **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. -- **Q3 — EPK range header names**: Does the Gateway 2.0 proxy accept `x-ms-start-epk` / `x-ms-end-epk` (existing Rust constants) or require `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (Java naming)? _Resolution_: pending Phase 2 confirmation with service team. -- **Q4 — Failure-fallback thresholds**: Target values are N=3 consecutive 503s in a 30s window, 60s cooldown. _Resolution_: pending implementation tuning against live test pipeline data. +- **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`THINCLIENT_RANGE_MIN`, `THINCLIENT_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. +- **Q4 — Failure-fallback thresholds**: Initial target values are **N=3 consecutive 503s in a 30s sliding window**, **60s cooldown** before retrying Gateway 2.0 for the affected partition. These are starting points; the live test pipeline (§Phase 6) is the tuning surface — values may be adjusted based on observed false-positive rates and recovery latencies before GA. Thresholds are not customer-tunable; they are internal driver constants. From 369735e24b5d989475b5b1abbcc7e2949442fc24 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 14:56:13 -0700 Subject: [PATCH 06/48] Cosmos: drop Retry Decision Table from Gateway 2.0 spec Existing retry policies (ClientRetryPolicy and friends) already cover the rows in that table; the spec was duplicating cross-cutting behavior. Updated the surrounding "not Proxy unreachable" bullets to reference the existing retry path instead of the removed table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 22 ++----------------- 1 file changed, 2 insertions(+), 20 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index b3c36180d3e..ef20203a9f5 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -399,30 +399,12 @@ For the purposes of the failure-fallback counter, a *Proxy unreachable* event is The following are explicitly **not** *Proxy unreachable*: -- Any response carrying a Cosmos sub-status header — those are server-routed errors and are handled by the regular Retry Decision Table. +- Any response carrying a Cosmos sub-status header — those are server-routed errors and are handled by the existing retry policies (unchanged from Gateway 1.0; see `ClientRetryPolicy` and friends). - Application-level 4xx (those go through the normal cross-region retry policy and never trigger Failure fallback). -- HTTP `503` carrying a Cosmos sub-status (e.g., `SERVER_GENERATED_410`); see the "503 Unavailable | server-returned" row of the Retry Decision Table. +- HTTP `503` carrying a Cosmos sub-status (e.g., `SERVER_GENERATED_410`) — handled by the existing 503 retry path. Where transport-level error classification overlaps with the broader transport pipeline, `TRANSPORT_PIPELINE_SPEC.md` is authoritative; this section narrows the set to those classes that count toward the Gateway 2.0 failure-fallback counter. -#### Retry Decision Table - -| Response | Sub-Status | Action | -| --- | --- | --- | -| 200-299 | — | Success | -| 404 | — | Not Found (propagate to caller) | -| 408 Timeout | — | Read: retry cross-region; Write: retry local only | -| 410 Gone | 1002 (PKRangeGone) | Refresh PKRange cache, retry | -| 410 Gone | 1007 (SplitMerge) | Refresh PKRange cache, retry | -| 410 Gone | 1008 (PartitionMigration) | Refresh PKRange cache, retry | -| 410 Gone | 1000 (NameCacheStale) | Refresh **collection** cache, retry | -| 410 Gone | other | Retry with backoff | -| 429 Throttled | — | Existing throttle retry loop (unchanged) | -| 449 Retry With | — | Retry same region (transient conflict) | -| 503 Unavailable | server-returned | Mark endpoint unavailable, failover; increment failure-fallback counter | -| 503 Unavailable | SDK-generated | Only retry if `SERVER_GENERATED_410` sub-status | -| Proxy unreachable (see §"Proxy unreachable definition") | — | Increment failure-fallback counter; if threshold crossed, enter Failure fallback (§Fallback taxonomy) and route remainder through `TransportMode::Gateway` | - #### Files Changed ``` From 9c44af1ef5107d5595493e44a8313d9f253b22f3 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 15:06:40 -0700 Subject: [PATCH 07/48] Cosmos: drop Gateway 2.0-specific failure fallback from spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Java's thin client has no equivalent mechanism: ThinClientStoreModel extends RxGatewayStoreModel, model selection is per-request and stateless, and the existing ClientRetryPolicy / WebExceptionRetryPolicy chain handles transport errors, 502/503/504, and regional unavailability uniformly across both transport modes. Rust takes the same posture - no per-partition counter, no sticky standard-gateway state, no cooldown timer, no Proxy-unreachable classification, no new gateway20_retry.rs state machine. Removed: - Failure fallback row from the Phase 4 fallback taxonomy - Proxy unreachable definition subsection - Failure-fallback references in Phase 4 retry list, files-changed, and test matrix - Open Question Q4 (thresholds) - no longer applicable - Failure-fallback counter mentions in §3.1 and Q1 The single remaining fallback is the per-request Eligibility fallback (operation not supported by Gateway 2.0 -> standard gateway), which is unrelated to failure handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 46 ++++--------------- 1 file changed, 10 insertions(+), 36 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index ef20203a9f5..50c71f5009a 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -72,10 +72,10 @@ Gateway 2.0 moves partition-level routing intelligence from the SDK into the ser | Latency SLA | No | **Yes** | Yes | | Simple Network | Yes | Yes | No | | Protocol | HTTP/REST | RNTBD/HTTP2 | RNTBD/TCP | -| Replica Mgmt | Proxy | Proxy | SDK | -| Partition Route | Proxy | Proxy | SDK | +| Replica Mgmt | Gateway/Proxy | Proxy | SDK | +| Partition Route | Gateway/Proxy | Proxy | SDK | | Regional Route | SDK | SDK | SDK | -| SDK Complexity | Low | Medium | High | +| SDK Complexity | Medium | Medium | High | | Firewall Rules | 1 endpoint | 1 endpoint | N replicas | --- @@ -92,7 +92,7 @@ The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 - **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config (see §3.4 for gating model) - **`CosmosTransport`** — `dataplane_gateway20_transport: OnceLock`, lazy init with `AdaptiveTransport::gateway20()` - **`AdaptiveTransport::ShardedGateway20`** variant — HTTP/2 only with prior knowledge (no HTTP/1.x fallback; the proxy does not accept HTTP/1.x — see Open Question Q1, resolved) -- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config; HTTP/2 negotiation failure surfaces as a transport error (which feeds the failure-fallback counter, §Phase 4) rather than downgrading +- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config; HTTP/2 negotiation failure surfaces as a transport error (handled by the existing retry policies) rather than downgrading - **`TransportKind::Gateway20`** in diagnostics ### 3.2 Already Implemented — account metadata & routing @@ -370,48 +370,24 @@ Retry policies are identical between Gateway 2.0 and standard gateway modes in b - `COMPLETING_PARTITION_MIGRATION` (1008): Refresh PKRange cache, retry - `NAME_CACHE_IS_STALE` (1000): Refresh **collection** cache (NOT PKRange), retry - Other sub-statuses: Retry with backoff, no cache refresh -- **Gateway 2.0 failure-driven fallback** — see "Fallback taxonomy" below. +- **Gateway 2.0 eligibility fallback** — see "Fallback taxonomy" below. - **Partition-Level Failover interaction** — when PLF (see `PARTITION_LEVEL_FAILOVER_SPEC.md`) selects a region whose `CosmosEndpoint` has no `gateway20_url`, **PLF wins**: the request falls back to standard gateway **for that partition** until PLF releases its override. PLF precedence prevents Gateway 2.0 from overriding an explicit per-partition region choice. #### Fallback taxonomy -Two distinct fallback mechanisms — do not conflate them: +Gateway 2.0 has a single fallback mechanism: | Name | Scope | Trigger | Duration | Unwind | | --- | --- | --- | --- | --- | | **Eligibility fallback** | Per-request | Operation is not eligible for Gateway 2.0 (fails `is_operation_supported_by_gateway20()`) | Single request only | N/A — recomputed every request | -| **Failure fallback** | Per-partition, sticky | N consecutive *Proxy unreachable* events within a rolling window (target N=3, window=30s — confirm in implementation tuning). See "Proxy unreachable definition" below. | Sticky until unwind | (a) next successful account-metadata refresh removes the affected `gateway20_url`, OR (b) a periodic probe to the proxy succeeds, OR (c) a fixed cooldown (target 60s) expires, whichever is first | -Failure fallback is per-partition rather than per-client so that one bad proxy region does not degrade requests to other partitions. Client-lifetime stickiness is explicitly **not** used — it would prevent recovery within the process. - -##### Proxy unreachable definition - -For the purposes of the failure-fallback counter, a *Proxy unreachable* event is any of the following observed against a Gateway 2.0 endpoint, regardless of which retry layer surfaces it: - -1. **Transport-level failures** (no HTTP response received): - - TCP connect refused / timed out (`std::io::ErrorKind::ConnectionRefused`, `TimedOut`, `ConnectionReset`). - - TLS handshake failure (cert, ALPN, or protocol mismatch). - - HTTP/2 `GOAWAY` received before the request stream completes, or the underlying connection is dropped mid-request. - - `reqwest::Error` whose `is_connect()`, `is_timeout()`, or `is_request()` returns true *before any response status is observed*. -2. **HTTP infrastructure responses** (response received but the proxy itself is unhealthy or absent): - - HTTP `502 Bad Gateway`, `504 Gateway Timeout`. - - HTTP `503 Service Unavailable` **without** a Cosmos sub-status header (i.e., the response did not originate inside the proxy's Cosmos error path). - -The following are explicitly **not** *Proxy unreachable*: - -- Any response carrying a Cosmos sub-status header — those are server-routed errors and are handled by the existing retry policies (unchanged from Gateway 1.0; see `ClientRetryPolicy` and friends). -- Application-level 4xx (those go through the normal cross-region retry policy and never trigger Failure fallback). -- HTTP `503` carrying a Cosmos sub-status (e.g., `SERVER_GENERATED_410`) — handled by the existing 503 retry path. - -Where transport-level error classification overlaps with the broader transport pipeline, `TRANSPORT_PIPELINE_SPEC.md` is authoritative; this section narrows the set to those classes that count toward the Gateway 2.0 failure-fallback counter. +There is intentionally **no** Gateway 2.0–specific failure-fallback mechanism (no per-partition consecutive-failure counter, no sticky standard-gateway state, no cooldown). Java's thin client takes the same posture: `ThinClientStoreModel extends RxGatewayStoreModel`, model selection is per-request and stateless via `useThinClientStoreModel()`, and the existing `ClientRetryPolicy` / `WebExceptionRetryPolicy` chain already handles transport errors, 502/503/504, and regional unavailability uniformly across both transport modes. Rust follows the same approach: when a Gateway 2.0 request fails, the existing retry policies retry it (which may re-select Gateway 2.0 or land on standard gateway through normal regional-failover behavior); no new state machine is introduced. #### Files Changed ``` EDIT src/driver/pipeline/operation_pipeline.rs — Gateway 2.0 retry classification + PLF precedence -EDIT src/driver/pipeline/components.rs — Add Gateway20FailureFallback state if needed -EDIT src/driver/transport/transport_pipeline.rs — Wire failure-fallback counter into transport -NEW src/driver/pipeline/gateway20_retry.rs — Gateway 2.0 failure-fallback state machine +EDIT src/driver/pipeline/components.rs — Gateway 2.0 retry surface integration ``` --- @@ -502,10 +478,9 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Bulk | | | Yes | Fan-out CRUD, distinct from Batch | | Change feed | | | Yes | LatestVersion, incremental | | Retry: 408 timeout | | Yes | | Cross-region for reads, local-only for writes | -| Retry: 503 | | Yes | | Regional failover; failure-fallback trigger and unwind | +| Retry: 503 | | Yes | | Regional failover via existing retry policies | | Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | | Eligibility fallback | | Yes | | StoredProc Execute → standard gateway | -| Failure fallback | | Yes | | Proxy down → sticky standard gateway; unwind via metadata refresh / cooldown | | PLF precedence | | Yes | | Region without gw20_url + PLF override → standard gateway path | | Multi-region failover | | Yes | Yes | Preferred regions, failover | | Fault injection | | Yes | | Timeout, 503, network error | @@ -535,7 +510,6 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway ## 5. Open Questions -- **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails (and trips the failure-fallback counter, §Phase 4) rather than downgrading. +- **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails and the existing retry policies handle it. - **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. - **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`THINCLIENT_RANGE_MIN`, `THINCLIENT_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. -- **Q4 — Failure-fallback thresholds**: Initial target values are **N=3 consecutive 503s in a 30s sliding window**, **60s cooldown** before retrying Gateway 2.0 for the affected partition. These are starting points; the live test pipeline (§Phase 6) is the tuning surface — values may be adjusted based on observed false-positive rates and recovery latencies before GA. Thresholds are not customer-tunable; they are internal driver constants. From 1fa81be7e154fb3232cdbd04920d1c3f17872526 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 15:37:56 -0700 Subject: [PATCH 08/48] Cosmos: fix RNTBD expansion in Gateway 2.0 spec RNTBD is "Real Name To Be Determined" - a placeholder name that stuck, not "Reliable Network Transfer Binary Data" (a backronym that LLM analysis tends to invent). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 50c71f5009a..7d414b63d0a 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -171,7 +171,7 @@ Invariants this spec locks in: **Crate**: `azure_data_cosmos_driver` **New module**: `src/driver/transport/rntbd/` -The RNTBD (Reliable Network Transfer Binary Data) protocol is the wire format used by Cosmos DB for efficient binary communication. Gateway 2.0 wraps RNTBD-encoded payloads inside HTTP/2 POST requests to the proxy. +The RNTBD ("Real Name To Be Determined" — a placeholder name that stuck) protocol is the wire format used by Cosmos DB for efficient binary communication. Gateway 2.0 wraps RNTBD-encoded payloads inside HTTP/2 POST requests to the proxy. #### What Will Be Done From 785fab974cd4eea05ba65af6c6f790c9b609749b Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 20 Apr 2026 16:55:55 -0700 Subject: [PATCH 09/48] Cosmos: fix Gateway 2.0 spec analyze failures Two fixes for the PR analyze stage: - cspell: add inline `cspell:ignore` directive for spec-specific jargon (THINCLIENT, thinclient, Mgmt, cutover, directconnectivity, footgun, cooldown, ALPN). Scoped to this file rather than the global word list since these are spec-only terms. - Link verification: convert relative sibling-spec links (TRANSPORT_PIPELINE_SPEC.md, PARTITION_KEY_RANGE_CACHE_SPEC.md, PARTITION_LEVEL_FAILOVER_SPEC.md) to absolute GitHub URLs as required by the Verify-Links guideline (sdk/ paths are not in allow-relative-links.txt). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 7d414b63d0a..bc68fce20b7 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,3 +1,4 @@ + # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating @@ -16,9 +17,9 @@ ### Related Specs -- [`TRANSPORT_PIPELINE_SPEC.md`](./TRANSPORT_PIPELINE_SPEC.md) — sharded HTTP/2 transport, timeout regime, hedging, `(HttpClient, host:port)` shard key. Gateway 2.0 reuses the sharded transport defined there verbatim; this spec does **not** introduce a new timeout or hedging policy. -- [`PARTITION_KEY_RANGE_CACHE_SPEC.md`](./PARTITION_KEY_RANGE_CACHE_SPEC.md) — PKRange cache semantics and `EffectivePartitionKey` usage; cited by Phase 2 for EPK computation and by Phase 4 for 410 handling. -- [`PARTITION_LEVEL_FAILOVER_SPEC.md`](./PARTITION_LEVEL_FAILOVER_SPEC.md) — per-partition region override semantics; cited by Phase 4 for PLF precedence over Gateway 2.0 routing. +- [`TRANSPORT_PIPELINE_SPEC.md`](https://github.com/Azure/azure-sdk-for-rust/blob/main/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md) — sharded HTTP/2 transport, timeout regime, hedging, `(HttpClient, host:port)` shard key. Gateway 2.0 reuses the sharded transport defined there verbatim; this spec does **not** introduce a new timeout or hedging policy. +- [`PARTITION_KEY_RANGE_CACHE_SPEC.md`](https://github.com/Azure/azure-sdk-for-rust/blob/main/sdk/cosmos/azure_data_cosmos_driver/docs/PARTITION_KEY_RANGE_CACHE_SPEC.md) — PKRange cache semantics and `EffectivePartitionKey` usage; cited by Phase 2 for EPK computation and by Phase 4 for 410 handling. +- [`PARTITION_LEVEL_FAILOVER_SPEC.md`](https://github.com/Azure/azure-sdk-for-rust/blob/main/sdk/cosmos/azure_data_cosmos_driver/docs/PARTITION_LEVEL_FAILOVER_SPEC.md) — per-partition region override semantics; cited by Phase 4 for PLF precedence over Gateway 2.0 routing. --- From cd77426fa2decc1109d4e961e2cce42499d34733 Mon Sep 17 00:00:00 2001 From: Tomas Varon <70857381+tvaron3@users.noreply.github.com> Date: Tue, 21 Apr 2026 11:17:15 -0700 Subject: [PATCH 10/48] Update sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index bc68fce20b7..a5e5d6150d2 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -135,7 +135,7 @@ Invariants this spec locks in: 4. **`x-ms-cosmos-use-thinclient` header** on account metadata requests (to trigger thin-client endpoint advertisement) 5. **SDK-to-driver cutover for EPK** — SDK call sites (`feed_range_from_partition_key`, `container_connection.rs:87`) still call the broken SDK hash; they must route through the driver's `EffectivePartitionKey::compute()` 6. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) -7. **Gateway 2.0 specific fallback** — Failure-driven fallback from Gateway 2.0 to standard gateway (see Phase 4) +7. **Rollout/cutover policy clarification** — Document the intended enablement and cutover behavior (see Phase 4); there is intentionally **no** Gateway 2.0-specific failure-driven fallback to the standard gateway 8. **Integration/E2E tests** — No gateway 2.0 test coverage beyond the routing-systems unit tests 9. **Fault injection** — No gateway 2.0 fault injection scenarios 10. **Constants cross-crate visibility** — `THINCLIENT_PROXY_*` and `START_EPK` / `END_EPK` currently live in `azure_data_cosmos::constants` but Phase 2 injects headers from the driver crate. Options (to decide in Phase 2): (a) move constants to `azure_data_cosmos_driver::constants` and re-export from SDK, (b) re-export SDK constants through a driver-side `pub use`, or (c) duplicate. Recommend (a). From 6891309772033d550e7957a95baa6a1a7c1dcc28 Mon Sep 17 00:00:00 2001 From: Tomas Varon <70857381+tvaron3@users.noreply.github.com> Date: Tue, 21 Apr 2026 11:17:29 -0700 Subject: [PATCH 11/48] Update sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index a5e5d6150d2..c25e0c594e4 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -443,7 +443,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - Single-region gateway 2.0 - Multi-region gateway 2.0 with failover -- Gateway 2.0 + standard gateway fallback (both eligibility and failure-driven) +- Gateway 2.0 + standard gateway eligibility fallback (per-request only; normal retries still apply) **Test Suites:** From 07535d73ad5ac4f08dd1db71d32ffe1f9704402b Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Tue, 21 Apr 2026 11:23:05 -0700 Subject: [PATCH 12/48] Cosmos: refine Gateway 2.0 spec retry/timeout/PLF wording - Timeout policy: note that any Gateway 2.0-specific timeout tuning is deferred to a follow-up. - PLF interaction: PLF picks the region; within that region Gateway 2.0 is preferred whenever a gateway20_url is available, otherwise the request falls back to standard gateway. Removes the previous "PLF wins" framing that implied PLF always defeats Gateway 2.0. - Drop the 408/503/410 retry-behavior bullets. The section already states retry policies are identical to standard gateway, so re-listing them risked drift from the canonical policy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index c25e0c594e4..03a647a38a8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -362,17 +362,9 @@ Retry policies are identical between Gateway 2.0 and standard gateway modes in b #### What Will Be Done -- **Timeout policy** — Gateway 2.0 requests use the timeout regime defined in `TRANSPORT_PIPELINE_SPEC.md` (single timeout, not bifurcated). Do not introduce Gateway-2.0-specific timeouts. -- **Read timeout cross-region retry** — On HTTP 408 with `GATEWAY_ENDPOINT_READ_TIMEOUT` sub-status, retry read operations in the next preferred region. -- **Service unavailable (503)** — Mark endpoint unavailable for partition key range, then retry. Follow Java's conservative approach: only retry server-returned 503 or SDK-generated 503 with `SERVER_GENERATED_410` sub-status. -- **Gone (410)** — Action depends on sub-status code: - - `PARTITION_KEY_RANGE_GONE` (1002): Refresh PKRange cache, retry - - `COMPLETING_SPLIT_OR_MERGE` (1007): Refresh PKRange cache, retry - - `COMPLETING_PARTITION_MIGRATION` (1008): Refresh PKRange cache, retry - - `NAME_CACHE_IS_STALE` (1000): Refresh **collection** cache (NOT PKRange), retry - - Other sub-statuses: Retry with backoff, no cache refresh +- **Timeout policy** — Gateway 2.0 requests use the timeout regime defined in `TRANSPORT_PIPELINE_SPEC.md` (single timeout, not bifurcated). Do not introduce Gateway-2.0-specific timeouts in this work; any Gateway 2.0–specific timeout tuning will be addressed in a follow-up. - **Gateway 2.0 eligibility fallback** — see "Fallback taxonomy" below. -- **Partition-Level Failover interaction** — when PLF (see `PARTITION_LEVEL_FAILOVER_SPEC.md`) selects a region whose `CosmosEndpoint` has no `gateway20_url`, **PLF wins**: the request falls back to standard gateway **for that partition** until PLF releases its override. PLF precedence prevents Gateway 2.0 from overriding an explicit per-partition region choice. +- **Partition-Level Failover interaction** — when PLF (see `PARTITION_LEVEL_FAILOVER_SPEC.md`) selects a region, the per-request decision is: if that region's `CosmosEndpoint` exposes a `gateway20_url`, the request uses Gateway 2.0; if it does not, the request falls back to standard gateway for that partition until PLF releases its override. PLF chooses the region; Gateway 2.0 is preferred whenever it is available in that region. #### Fallback taxonomy From b2fa3b3d3b144e7330e897ef0f8b63b140bd3bc6 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Tue, 21 Apr 2026 16:20:38 -0700 Subject: [PATCH 13/48] Cosmos: address Gateway 2.0 spec review comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Apply Kevin's unresolved review feedback on PR #4223: - Reword §1 Overview to say "RNTBD binary serialization over the HTTP/2 protocol" (clearer separation of serialization vs. transport). - Soften the SLA latency bullet in §2 Key Benefits to "plans to provide contractual latency commitments" — we have not updated contractual terms yet, so avoid overpromising ahead of broad-usage measurement. - Add a cross-partition query aggregation bullet under §2 Design Philosophy → SDK Responsibility, noting it stays client-side under Gateway 2.0 (no server-side aggregation). - Fix the Protocol row in the Connection Mode Comparison table: HTTP/REST → REST/HTTP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 03a647a38a8..8a200414548 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -25,7 +25,7 @@ ## 1. Overview -Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clients to route data-plane operations through a lightweight proxy endpoint instead of directly to backend replicas. It uses RNTBD binary protocol over HTTP/2, with the proxy handling partition routing, replica selection, and load balancing. +Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clients to route data-plane operations through a lightweight proxy endpoint instead of directly to backend replicas. It uses RNTBD binary serialization over the HTTP/2 protocol, with the proxy handling partition routing, replica selection, and load balancing. **Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Avoid "thin client" except when referencing Java/.NET code or existing constants (`THINCLIENT_*`). @@ -44,7 +44,7 @@ Traditional Cosmos DB offers two connection modes: ### Key Benefits -- **SLA latency guarantees** — Unlike traditional gateway, Gateway 2.0 provides contractual latency commitments comparable to direct mode +- **SLA latency guarantees** — Unlike traditional gateway, Gateway 2.0 plans to provide contractual latency commitments comparable to direct mode - **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTPS; no need to open firewall rules to individual backend replicas - **Reduced SDK complexity** — The proxy handles replica discovery, connection management, and partition-level routing; the SDK only needs RNTBD serialization and endpoint selection - **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection, reducing connection overhead vs. direct mode's per-replica TCP connections @@ -59,6 +59,7 @@ Gateway 2.0 moves partition-level routing intelligence from the SDK into the ser - Regional endpoint selection - RNTBD serialization - EPK header injection +- Cross-partition query aggregation (unchanged from Gateway/Direct modes — the SDK continues to issue per-partition sub-queries and aggregate results client-side; Gateway 2.0 does not server-side aggregate) **Gateway 2.0 Proxy (Server-Side):** @@ -72,7 +73,7 @@ Gateway 2.0 moves partition-level routing intelligence from the SDK into the ser | --- | --- | --- | --- | | Latency SLA | No | **Yes** | Yes | | Simple Network | Yes | Yes | No | -| Protocol | HTTP/REST | RNTBD/HTTP2 | RNTBD/TCP | +| Protocol | REST/HTTP | RNTBD/HTTP2 | RNTBD/TCP | | Replica Mgmt | Gateway/Proxy | Proxy | SDK | | Partition Route | Gateway/Proxy | Proxy | SDK | | Regional Route | SDK | SDK | SDK | From 82a9ec11eaf734fb470fb28a180fb67020bf5e7f Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Sun, 26 Apr 2026 19:09:13 -0700 Subject: [PATCH 14/48] Cosmos: Address PR #4223 round-5 review on Gateway 2.0 spec Updates the Gateway 2.0 design spec in response to PR #4223 review feedback from analogrelay and FabianMeiswinkel. - Routing ownership: clarify SDK keeps regional + partition-level routing; only replica-level routing within a partition moves to the proxy. - API correctness: replace the fictional 'CosmosClient::create_item(T)' walkthrough with the real 'ContainerClient::create_item(partition_key, item, options)' signature in sections 4.1 and 4.2. - SDK <-> driver boundary: mark section 3.6-10 resolved. Gateway-2.0 constants ('THINCLIENT_PROXY_*', range headers, etc.) live exclusively in 'azure_data_cosmos_driver::constants' with no SDK re-export. The SDK invokes the generic 'CosmosDriver::execute_operation' interface and the driver decides Gateway 2.0 vs standard gateway internally. - SPROC: drop the .NET-vs-Java framing. Stored-procedure execution is out of scope for Rust SDK GA; eligibility fallback routes any incoming SPROC request to the standard gateway. - New gap (3.6-11): pre-Phase-2 audit deliverable to enumerate every EPK/PartitionKeyRange-shaped struct across both crates and consolidate on a single canonical type. - Operator override: restore 'CosmosClientOptions::gateway20_disabled' (default false) as the single supported public kill-switch. No env var (intentional discouragement of casual / fleet-wide enablement). Carries an explicit warning that flipping it voids Gateway 2.0's latency SLA and impacts 24/7 Microsoft support eligibility for performance regressions. - New gap (3.6-12): retry behavior for 449 (Retry-With: same endpoint, standard backoff, no region switch) and 404 / sub-status 1002 (PARTITION_KEY_RANGE_GONE: refresh PKRange cache, prefer remote region; PLF region wins when PLF has pinned the PKRangeId). Phase 6 test matrix expanded with both rows. - HPK gating refinement (carried forward from round 4): only emit 'x-ms-documentdb-partitionkey' alongside the EPK header(s) when the request carries the FULL partition key (point ops on any container, and full-key single-logical-partition queries on HPK containers). Prefix queries on HPK containers emit the EPK-range headers only. - Prose scrub: 'thin client' / 'thin-client' rewritten to 'Gateway 2.0' in body prose. Wire-header literals ('x-ms-thinclient-*', 'thinClientReadableLocations'), Rust API symbol names ('has_thin_client_endpoints'), and .NET / Java symbol references retained verbatim. - cspell pass clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 190 ++++++++++++++---- 1 file changed, 156 insertions(+), 34 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 8a200414548..847125647c8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,4 +1,4 @@ - + # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating @@ -46,25 +46,26 @@ Traditional Cosmos DB offers two connection modes: - **SLA latency guarantees** — Unlike traditional gateway, Gateway 2.0 plans to provide contractual latency commitments comparable to direct mode - **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTPS; no need to open firewall rules to individual backend replicas -- **Reduced SDK complexity** — The proxy handles replica discovery, connection management, and partition-level routing; the SDK only needs RNTBD serialization and endpoint selection +- **Reduced SDK complexity** — The proxy handles replica discovery, connection management, and replica-level routing within a partition; the SDK only needs RNTBD serialization, partition-level routing (PKRange resolution / EPK computation), and endpoint selection - **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection, reducing connection overhead vs. direct mode's per-replica TCP connections - **Transparent failover** — The proxy handles replica failover within a partition; the SDK handles regional failover across proxy endpoints ### Design Philosophy -Gateway 2.0 moves partition-level routing intelligence from the SDK into the server-side proxy while keeping regional routing in the SDK. This gives the best of both worlds: +Gateway 2.0 moves **replica-level** routing intelligence from the SDK into the server-side proxy while keeping **regional and partition-level** routing in the SDK. The SDK still resolves PKRanges, computes EPK headers, and selects the regional endpoint; what moves to the proxy is the per-request choice of which replica within a partition serves the operation, plus the connection management and load balancing that goes with it. This gives the best of both worlds: **SDK Responsibility:** - Regional endpoint selection +- Partition routing (PKRange resolution, EPK→PKRangeId mapping) - RNTBD serialization - EPK header injection - Cross-partition query aggregation (unchanged from Gateway/Direct modes — the SDK continues to issue per-partition sub-queries and aggregate results client-side; Gateway 2.0 does not server-side aggregate) **Gateway 2.0 Proxy (Server-Side):** -- Partition routing -- Replica selection +- Replica selection within a partition +- Connection management - Load balancing ### Connection Mode Comparison @@ -120,9 +121,13 @@ Two independent guards exist today (`is_gateway20_allowed` is checked in both ro Invariants this spec locks in: - `prefer_gateway20` is computed **once per request** during `resolve_endpoint` from: - `connection_pool().is_gateway20_allowed() && account.has_thin_client_endpoints()` + `!options.gateway20_disabled && connection_pool().is_gateway20_allowed() && account.has_thin_client_endpoints()` - After `resolve_endpoint`, downstream stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. -- `ConnectionPoolOptions.is_gateway20_allowed` and its env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` are an **unsupported, undocumented kill switch** reserved for emergency fallback. They are NOT exposed on `CosmosClientOptions` and may be removed without notice. +- **Operator override: `CosmosClientOptions::gateway20_disabled` (default `false`)** — Customers and operators MAY set `gateway20_disabled = true` on `CosmosClientOptions` to force every request from the client to route through the standard gateway, even when the account advertises Gateway 2.0 endpoints and the operation would otherwise be eligible. + + ⚠️ **Setting this flag voids the latency-SLA story Gateway 2.0 is being built to deliver. It also impacts the ability to receive 24/7 Microsoft support for performance regressions on this client. Use only when explicitly directed by Microsoft Support during incident triage.** The flag is intentionally **not** exposed via environment variable to discourage casual / fleet-wide enablement; operators who need it must opt in per-client through code. + + The internal `ConnectionPoolOptions::is_gateway20_allowed` flag and its env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` are pre-existing bring-up scaffolding, slated for removal in Phase 5 cleanup. The public `gateway20_disabled` setting is the single supported disablement mechanism going forward. ### 3.5 Known broken / do-not-use @@ -130,16 +135,22 @@ Invariants this spec locks in: ### 3.6 Not Yet Implemented (Gaps) -1. **RNTBD serialization/deserialization** — No binary protocol encoding/decoding exists -2. **Gateway 2.0 header injection** — Thin client proxy headers and EPK range headers are not applied to requests on the Gateway 2.0 path +1. **RNTBD serialization/deserialization** — No binary protocol encoding/decoding exists. Both directions live in the driver: serialization in `rntbd/request.rs`, deserialization in `rntbd/response.rs`. The SDK never handles raw RNTBD bytes — the response decode happens inside the driver and the SDK only sees typed results. See §4.2 step 6. +2. **Gateway 2.0 header injection** — Gateway 2.0 proxy headers and EPK range headers are not applied to requests on the Gateway 2.0 path 3. **Supported operation filtering** — No `IsOperationSupportedByThinClient()` equivalent 4. **`x-ms-cosmos-use-thinclient` header** on account metadata requests (to trigger thin-client endpoint advertisement) 5. **SDK-to-driver cutover for EPK** — SDK call sites (`feed_range_from_partition_key`, `container_connection.rs:87`) still call the broken SDK hash; they must route through the driver's `EffectivePartitionKey::compute()` 6. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) -7. **Rollout/cutover policy clarification** — Document the intended enablement and cutover behavior (see Phase 4); there is intentionally **no** Gateway 2.0-specific failure-driven fallback to the standard gateway +7. **Rollout/cutover policy clarification** — Document the intended enablement and cutover behavior (see Phase 4); there is intentionally **no** Gateway 2.0-specific failure-driven fallback to the standard gateway. The supported operator override is `CosmosClientOptions::gateway20_disabled` (§3.4) — a per-client opt-out with explicit SLA / support warnings. 8. **Integration/E2E tests** — No gateway 2.0 test coverage beyond the routing-systems unit tests 9. **Fault injection** — No gateway 2.0 fault injection scenarios -10. **Constants cross-crate visibility** — `THINCLIENT_PROXY_*` and `START_EPK` / `END_EPK` currently live in `azure_data_cosmos::constants` but Phase 2 injects headers from the driver crate. Options (to decide in Phase 2): (a) move constants to `azure_data_cosmos_driver::constants` and re-export from SDK, (b) re-export SDK constants through a driver-side `pub use`, or (c) duplicate. Recommend (a). +10. **Constants cross-crate visibility** — _Resolved_. Per PR review (analogrelay): the SDK has no Gateway-2.0 surface area whatsoever. `THINCLIENT_PROXY_*`, `THINCLIENT_RANGE_MIN/MAX`, and Gateway-2.0-specific header constants live exclusively in `azure_data_cosmos_driver::constants`; **no SDK re-export**. The SDK calls the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs standard gateway internally. The legacy `START_EPK` / `END_EPK` constants in `azure_data_cosmos::constants` remain for any non-Gateway-2.0 callers but are not used on the Gateway 2.0 path. Phase 2 deliverable includes the move. +11. **EPK Range type consolidation** — There appear to be multiple `EpkRange` / `PartitionKeyRange` / EPK-bound representations across `azure_data_cosmos` and `azure_data_cosmos_driver`. **Pre-Phase-2 audit deliverable**: enumerate every EPK-range-shaped struct in both crates, document overlap, and pick one canonical representation. Phase 2's EPK header injection MUST reuse the chosen canonical type — it must not introduce a new EPK-range type. Track in PR review of the Phase 2 implementation. +12. **Gateway 2.0 retry behavior for region-routed status codes** — Beyond the timeout / 408 handling already deferred to `TRANSPORT_PIPELINE_SPEC.md`, the Gateway 2.0 path inherits these region-aware retry rules from the standard pipeline (no Gateway-2.0-specific override needed): + - **HTTP 449 (Retry-With)** — Retry against the **same** Gateway 2.0 endpoint with the standard backoff schedule. **Do not** switch regions on 449. **Do not** fall back to standard gateway on 449 — the proxy is healthy; the backend asked for a retry. + - **HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`)** — Refresh the PKRange cache, then retry. **Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. + + These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy operates on the resolved `(status_code, sub_status)` pair before the transport-specific deserializer ever sees the body. --- @@ -147,7 +158,7 @@ Invariants this spec locks in: ### 4.1 Current Request Flow (Gateway 1.0) -1. `CosmosClient::create_item(T)` calls `ContainerClient` +1. `ContainerClient::create_item(partition_key, item, options)` calls into `ContainerClient` 2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK (via the broken SDK hash today — see §3.5), resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) 4. `resolve_endpoint()` selects a gateway endpoint @@ -156,13 +167,14 @@ Invariants this spec locks in: ### 4.2 Target Request Flow (Gateway 2.0) -1. `CosmosClient::create_item(T)` calls `ContainerClient` +1. `ContainerClient::create_item(partition_key, item, options)` calls into `ContainerClient` 2. `container_connection.rs` serializes `T` to `&[u8]`; EPK computation is deferred to the driver (via `EffectivePartitionKey::compute()` / `::compute_range()`), which then resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) 4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if `prefer_gateway20` per §3.4) 5. Transport Pipeline checks `is_operation_supported_by_gateway20()`: - **YES**: Inject gateway 2.0 headers + RNTBD serialize → HTTP/2 POST to Gateway 2.0 Proxy (SLA) - **NO**: Standard HTTP/REST request to Cosmos Gateway (eligibility fallback — per-request, deterministic) +6. Driver deserializes the RNTBD response (24-byte frame header → metadata token stream → optional body payload, per §Phase 1) into a domain `RntbdResponse`, then maps the body bytes to the typed result (`T`, `FeedResponse`, etc.) before returning to the SDK. The SDK never sees the raw RNTBD bytes — that boundary stays in the driver, mirroring the EPK-pushdown decision in step 2. > **Naming**: The function is `is_operation_supported_by_gateway20()` throughout. Older drafts used `is_supported_by_gw20()` — do not reintroduce the abbreviation. @@ -185,7 +197,27 @@ The RNTBD ("Real Name To Be Determined" — a placeholder name that stuck) proto #### Versioning -Thin client RNTBD has no version negotiation on the wire. The proxy advertises a single supported frame format per endpoint and rejects mismatched frames at the HTTP layer (the HTTP/2 request fails rather than triggering an RNTBD version-mismatch error). Direct-mode RNTBD has version negotiation (`CURRENT_PROTOCOL_VERSION = 0x00000001`); **do not** apply that pattern here. +Gateway 2.0 RNTBD has no version negotiation on the wire. The proxy advertises a single supported frame format per endpoint and rejects mismatched frames at the HTTP layer (the HTTP/2 request fails rather than triggering an RNTBD version-mismatch error). Direct-mode RNTBD has version negotiation (`CURRENT_PROTOCOL_VERSION = 0x00000001`); **do not** apply that pattern here. + +#### Metadata token filtering (forward-compat contract) + +The Rust deserializer **must** treat the RNTBD response metadata-token stream as forward-compatible: + +- **Recognized response tokens** (mirror Java's `RntbdResponseHeader` set, finalized against Java source during implementation): request charge, session token, continuation token, activity-id echo, sub-status code, retry-after-milliseconds, LSN, partition-key-range-id, global-committed-lsn, item-lsn, transport-request-id, owner-id, and similar metadata. The exact token-ID enum is part of `rntbd/tokens.rs` (§"What Will Be Done"). +- **Unknown token type IDs MUST be silently skipped** (consume `length` bytes and continue) — the deserializer must NOT panic, return an error, or fail the response, and must NOT log per-token (silent skip is the contract). The proxy is free to add new metadata tokens at any time and the driver must remain forward-compatible across proxy upgrades that ship before the corresponding Rust release. This silent-tolerance behavior is the *implementation* of the `IgnoreUnknownRntbdTokens` capability bit advertised over the `x-ms-cosmos-sdk-supportedcapabilities` header (see "SDK-supported-capabilities advertisement" below) — the proxy/backend assumes the SDK will not surface or warn on unknown tokens, so per-token logging is unnecessary noise. +- **Inverse contract on the request side**: the request serializer drops headers that appear in `thinClientProxyExcludedSet` (see §"RNTBD Request Wire Format" Notes column). That set enumerates headers the proxy does not understand on the inbound RNTBD frame; emitting them would be either ignored or rejected. + +Phase 6's "RNTBD unknown-token tolerance" unit test pins this behavior: a hand-crafted response frame containing a synthetic unrecognized token ID must round-trip without error and surface every recognized token correctly. + +#### SDK-supported-capabilities advertisement + +The Rust SDK already wires the HTTP request header `x-ms-cosmos-sdk-supportedcapabilities` (`COSMOS_SDK_SUPPORTEDCAPABILITIES`, `azure_data_cosmos/src/constants.rs:157`) and emits it on every gateway request from `azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs:14-31`. Today the value sent over the wire is the literal string `"0"` — i.e., zero capabilities advertised. + +Phase 1 must change the emitted value to the bitmask `(PartitionMerge | IgnoreUnknownRntbdTokens)`, matching the minimum capability set the .NET SDK asserts in its contract tests (`SDKSupportedCapabilities.cs`). The header value is a string-encoded decimal of the bitwise OR of the enum bits; the precise integer value should be looked up against `SDKSupportedCapabilities.cs` at implementation time and committed as a Rust constant alongside the existing `COSMOS_SDK_SUPPORTEDCAPABILITIES` header name. + +The `IgnoreUnknownRntbdTokens` bit is the contract that backs the silent-skip behavior in "Metadata token filtering" above: the proxy/backend uses this advertisement to decide whether it is safe to add new RNTBD tokens without coordinating with this SDK release. Advertising the bit while *also* failing or warning on unknown tokens would be a contract violation; advertising `"0"` while silently skipping unknown tokens is "merely conservative" but causes the proxy to assume zero forward-compat tolerance — both are wrong. Phase 1 must reconcile both ends. + +Phase 6 test coverage: assert the header value emitted on Gateway 2.0 (and standard Gateway) requests is the expected bitmask string, not `"0"`. #### RNTBD Request Wire Format @@ -239,14 +271,14 @@ This phase wires RNTBD serialization into the existing transport pipeline and ad #### What Will Be Done -- **Operation filtering** — `is_operation_supported_by_gateway20(resource_type, operation_type) → bool`. Following Java (`ThinClientStoreModel`), only `ResourceType::Document` operations are eligible. The .NET position (`IsOperationSupportedByThinClient` additionally allows `StoredProcedure::ExecuteJavaScript`) is **intentionally not adopted**. +- **Operation filtering** — `is_operation_supported_by_gateway20(resource_type, operation_type) → bool`. Following Java (`ThinClientStoreModel`), only `ResourceType::Document` operations are eligible. All other resource types — including stored-procedure execution, which is **out of scope for Rust SDK GA** — fall through to the standard gateway via the eligibility-fallback path. - **EPK computation** — Call `EffectivePartitionKey::compute()` (point) or `::compute_range()` (feed/cross-partition) from the driver layer. Do **not** call `azure_data_cosmos::hash::get_hashed_partition_key_string` (§3.5). SDK call sites that currently use it must route through the driver's implementation as part of this phase. - **EPK error propagation** — If EPK computation returns `Err` (MultiHash-requires-V2, component-count mismatch, etc.), surface as `CosmosStatus::BadRequest` to the caller. **Do not** fall back to standard gateway — the same inputs would be equally broken there. -- **Header injection** — When `transport_mode == Gateway20`, inject the thin-client headers listed below. +- **Header injection** — When `transport_mode == Gateway20`, inject the Gateway 2.0 headers listed below. - **Request body wrapping** — Serialize the entire request (headers + body) into RNTBD binary format and POST as the HTTP/2 body. - **Response unwrapping** — Deserialize the RNTBD response body back into `CosmosResponseHeaders` + raw document bytes. - **Eligibility fallback** — Operation ineligible for Gateway 2.0 → route through standard gateway for this single request (per-request, deterministic). See §Phase 4 for the distinct failure-driven fallback. -- **Constants placement** — Resolve the cross-crate constants question from §3.6-10 (recommend: move `THINCLIENT_PROXY_*` and `START_EPK` / `END_EPK` to a driver-side module, re-export from SDK). +- **Constants placement** — Move `THINCLIENT_PROXY_*` (and any other Gateway-2.0-specific header constants) into `azure_data_cosmos_driver::constants` as part of Phase 2. **No SDK re-export** — the SDK has no Gateway-2.0 awareness; it invokes the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs standard gateway internally. See §3.6-10 (resolved). #### Supported Operations @@ -265,7 +297,7 @@ Only `ResourceType::Document` is eligible for gateway 2.0 (following Java's appr | ReadFeed | Yes | LatestVersion change feed only; excludes AllVersionsAndDeletes | | Batch | Yes | Transactional same-PK batch (single resource, single request). | | Bulk | Yes | SDK-side fan-out of independent CRUD ops; each fan-out leg is a separate eligible Document op. Distinct from Batch. | -| StoredProcedure Execute | **No** | Following Java; Rust does **not** follow .NET's `ExecuteJavaScript` allowance. | +| StoredProcedure Execute | **No** | Stored-procedure execution is out of scope for Rust SDK GA. Eligibility fallback routes any incoming SPROC request to the standard gateway. | | All other resource types | **No** | Metadata operations use standard gateway | #### Header naming (proxy headers, in HTTP/2 request headers — not RNTBD tokens) @@ -274,15 +306,89 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They | Header (wire) | Rust constant (crate) | Semantics | When emitted | | --- | --- | --- | --- | -| `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (SDK today; move to driver per §3.6-10) | Numeric operation type | Every Gateway 2.0 request | -| `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (SDK today; move) | Numeric resource type | Every Gateway 2.0 request | +| `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (driver) | Numeric operation type | Every Gateway 2.0 request | +| `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (driver) | Numeric resource type | Every Gateway 2.0 request | +| `x-ms-thinclient-account-name` | **NEW** — `THINCLIENT_ACCOUNT_NAME` (driver) | Global database account name (e.g., `myacct` from `myacct.documents.azure.com`); region-independent tenant identity. Source: .NET `BaseProxyClientHttpMessageHandler.AccountName` (`/Product/SDK/.net/Microsoft.Azure.Cosmos.Friends/src/BaseProxyClientHttpMessageHandler.cs:20`); value matches `GlobalDatabaseAccountName` (compute-gateway side: `SqlApiOperationHandler.cs:1135`). | Every Gateway 2.0 request | +| `x-ms-thinclient-regional-account-name` | **NEW** — `THINCLIENT_REGIONAL_ACCOUNT_NAME` (driver) | Region-stamped document-service identity, format `-` lowercase (e.g., `myacct-eastus`). Source: .NET `BaseProxyClientHttpMessageHandler.RegionalAccountName` (`BaseProxyClientHttpMessageHandler.cs:22`); value matches `DocumentServiceId`; region-format derivation matches `AdminEndpointActions.cs:6236-6237`. | Every Gateway 2.0 request | | `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | +| `x-ms-documentdb-partitionkey` | existing `PARTITION_KEY` constant (SDK) | JSON-encoded partition-key value | Point ops AND single-logical-partition query ops — emitted **alongside** `x-ms-effective-partition-key` **only when the request carries the full partition-key value** (see HPK note below). For HPK containers scoped to a prefix of the partition-key definition, this header is **omitted** and only the EPK / EPK-range headers are sent. | | `x-ms-thinclient-range-min` | **NEW** — `THINCLIENT_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | | `x-ms-thinclient-range-max` | **NEW** — `THINCLIENT_RANGE_MAX` (driver) | Upper bound of EPK range | Feed / cross-partition ops only | | `x-ms-cosmos-use-thinclient` | **NEW** (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | Per Q3 resolution, the Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (it does **not** accept `x-ms-start-epk` / `x-ms-end-epk`). Phase 2 introduces the new constants above; the existing `START_EPK` / `END_EPK` constants are not emitted on the Gateway 2.0 path. +**Account-name + regional-account-name headers (proxy tenant routing)**: both `x-ms-thinclient-account-name` and `x-ms-thinclient-regional-account-name` are emitted on **every** Gateway 2.0 request (point, feed, batch, bulk, change feed, etc.) — this is the established proxy contract today, not future-proofing. The proxy uses the two headers in tandem: + +- **Account-name** identifies the tenant (which Cosmos account this request belongs to) so the proxy can look up the correct backend federation. Region-independent. +- **Regional-account-name** pins the request to a specific regional document-service / compute-gateway federation. This is required because globally-distributed accounts have one federation per region — writes must land in the write region while reads can stay local — so the proxy must know not just *which account* but also *which physical regional federation* to route to without re-resolving on every request. + +Source values: + +- Account-name = the host label of the account endpoint URL (the `myacct` portion of `myacct.documents.azure.com`), parsed once at client construction. +- Regional-account-name = `-` lowercase, where `region-name` is the location string of the target `CosmosEndpoint` (e.g., `myacct-eastus`). The Rust driver's per-`CosmosEndpoint` region context (already maintained by the PLF / preferred-region machinery) is the natural source. + +Both headers mirror .NET's `BaseProxyClientHttpMessageHandler` (the shared base class used by `ThinClientHttpMessageHandler`, `DqsClientHttpMessageHandler`, and `DtcClientHttpMessageHandler` — i.e., every proxy path emits both unconditionally). + +**PK header alongside EPK (HPK full-key gating)**: the driver emits `x-ms-documentdb-partitionkey` (the raw, JSON-encoded partition-key value) **only when the operation carries the full partition-key value** — i.e., the number of components supplied equals the container's partition-key definition arity. This applies to both point operations and single-logical-partition query operations. + +- **Single-component (non-HPK) containers**: every point op and every single-logical-partition query supplies the full PK by definition, so the header is always emitted alongside `x-ms-effective-partition-key`. +- **Hierarchical (multi-component, HPK) containers**: + - Full-key request (component count == definition arity) → emit BOTH `x-ms-documentdb-partitionkey` AND `x-ms-effective-partition-key`. The proxy can use the raw PK to skip recomputing EPK and to choose finer-grained replica selection than EPK alone allows. + - Prefix-key request (component count < definition arity) → emit ONLY the EPK-range carriers (`x-ms-thinclient-range-min` / `-max`). Do **not** emit `x-ms-documentdb-partitionkey` with a partial value, because the proxy treats that header as the canonical full PK and a partial value would route incorrectly. +- **Cross-partition feed / query ops**: continue to emit only the EPK range headers — no PK header on the feed path regardless of HPK arity. + +Gating is decided at header-injection time using the partition-key definition (already cached on the container) and the operation's supplied PK component count; no runtime computation cost beyond a length compare. + +#### Consistency header reconciliation (`ConsistencyLevel` ↔ `ReadConsistencyStrategy`) + +The Cosmos SDK exposes two consistency knobs that can both target the same read operation: + +- **`ConsistencyLevel`** — per-request override of the account default consistency. +- **`ReadConsistencyStrategy`** (defined in `azure_data_cosmos_driver::options::read_consistency`) — read-only strategy override (`Default`, `Eventual`, `Session`, `LatestCommitted`, `GlobalStrong`); supersedes `ConsistencyLevel` on reads. + +This subsection is the Rust mirror of the cross-SDK design landed in [Java PR #48787](https://github.com/Azure/azure-sdk-for-java/pull/48787) (with .NET parity in PR #5685 and proxy-side changes coordinated via internal ADO PR #2031635). Wire-format and resolution semantics MUST match Java/.NET so that a single proxy-side validation suite is sufficient. + +##### Wire carriers + +| Transport | Wire carrier for the resolved value | Encoding | +| --- | --- | --- | +| Standard Gateway (V1, HTTP) | HTTP request header `x-ms-cosmos-read-consistency-strategy` (per Java `HttpConstants.READ_CONSISTENCY_STRATEGY`) | String, exact case-sensitive values: `"Eventual"`, `"Session"`, `"LatestCommitted"`, `"GlobalStrong"`. Header is omitted entirely when the resolved RCS is `Default`. | +| Gateway 2.0 (RNTBD) | RNTBD metadata token ID `0x00F0` | **Byte** type — `Eventual = 0x01`, `Session = 0x02`, `LatestCommitted = 0x03`, `GlobalStrong = 0x04`. The token MUST be Byte-encoded; per the Java PR an earlier String-typed prototype caused the proxy to hang. The token is omitted entirely when the resolved RCS is `Default`. | + +The byte values are pinned against the proxy's C++ enum (proxy ADO PR #2031635). Phase 1's RNTBD token catalog grows a row for `ReadConsistencyStrategy = 0x00F0 (Byte)` enumerating the four byte values. + +##### Resolution precedence + +A single resolution step runs in the driver pipeline (alongside the existing `is_session_effective` computation in `operation_pipeline.rs`) **before** transport selection. It produces exactly one resolved consistency value, which is then handed off to whichever transport (V1 HTTP or V2 RNTBD) carries it on the wire. + +Sources, highest precedence first: + +1. Request-level `ReadConsistencyStrategy` (read ops only) +2. Request-level `ConsistencyLevel` +3. Client-level `ReadConsistencyStrategy` (read ops only) +4. Client-level `ConsistencyLevel` +5. Account default consistency (no header / no token emitted; backend applies its default) + +`ReadConsistencyStrategy::Default` at any level is a pass-through — falls through to the next source. Write operations skip steps 1 and 3 entirely (RCS is read-only); writes resolve from steps 2/4/5. + +##### Dual-header rejection rule + +The compute gateway rejects requests that carry both `x-ms-consistency-level` AND `x-ms-cosmos-read-consistency-strategy`. The Rust driver MUST therefore enforce mutual exclusion on both transports: + +- **V1 HTTP**: when resolved RCS is non-Default, the driver sends only `x-ms-cosmos-read-consistency-strategy` and **strips** any `x-ms-consistency-level` from the outgoing header set. When resolved RCS is Default, the driver sends only `x-ms-consistency-level` (if a `ConsistencyLevel` was resolved at any level) and omits the RCS header. +- **V2 RNTBD**: same mutual exclusion applied to the RNTBD metadata stream — emit either the `ConsistencyLevel` token or the `ReadConsistencyStrategy` token (`0x00F0`), never both. The Gateway 2.0 RNTBD serializer consumes the **already-resolved** value and decides which of the two tokens to emit; it does not re-run resolution. + +##### GlobalStrong client-side validation + +When the resolved RCS is `GlobalStrong` and the account default consistency is **not** `Strong`, the driver MUST fail the operation **before** transport selection / serialization with a `BadRequestException`-equivalent (Rust: `azure_core::Error` with the appropriate `ErrorKind`). This avoids a wasted round-trip and matches Java's fail-fast semantics. The check uses the cached account properties already maintained by the driver; no additional metadata fetch is required. + +##### Implementation footgun (Java bug class to avoid) + +Resolution MUST NOT mutate the request's header map in place. The Java fix in `RxGatewayStoreModel.applySessionToken()` switched to a header-map copy because the prior code's mutation rewrote `x-ms-consistency-level` (e.g., `LatestCommitted` was rewritten to `BoundedStaleness`); the gateway then rejected the request because `BoundedStaleness` was stricter than the Session account default. Even though the underlying conflict was real, the diagnostic was unrecoverable because the original headers had already been clobbered. + +For Rust: thread the resolved consistency value through the pipeline as an explicit input to whichever transport handler runs next. Do not write back into the operation's header collection during resolution. If the operation's header collection is needed for the final serialize step, clone it first or pass the resolved value separately. + #### Range header wire format EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `EffectivePartitionKey`'s `Ord` / `cmp` impl, which correctly handles the mixed-length boundaries returned by the backend; the `epk_cmp_*` tests in `container_routing_map.rs` (around L625–665) pin this behavior. The comparator is consumed via `binary_search_by(|r| r.min_inclusive.cmp(&epk_val))` (≈L282 of the same file). `@analogrelay`'s earlier zero-padding proposal in PR #4087 (commit `25233c903`) was **not** adopted; stay consistent with the length-aware convention. @@ -295,10 +401,12 @@ When `transport_mode == Gateway20`: 1. Set `x-ms-thinclient-proxy-operation-type` (numeric operation type) 2. Set `x-ms-thinclient-proxy-resource-type` (numeric resource type) -3. Point operation? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`) - Feed operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` (from `EffectivePartitionKey::compute_range()`) -4. Serialize headers + body into RNTBD binary format (Phase 1) -5. POST RNTBD body to gateway 2.0 endpoint via HTTP/2 +3. Set `x-ms-thinclient-account-name` (account host label) **and** `x-ms-thinclient-regional-account-name` (`-` lowercase, sourced from the active `CosmosEndpoint`'s region) — every request, see "Account-name + regional-account-name headers" note above +4. Point op or single-logical-partition query op? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`); additionally set `x-ms-documentdb-partitionkey` (JSON-encoded PK value) **only when the supplied PK component count equals the container's partition-key definition arity** (full-key gating, see HPK note above). For HPK prefix-key requests, omit the PK header. + Cross-partition feed / query operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` (from `EffectivePartitionKey::compute_range()`); do **not** emit the PK header on the feed path. +5. Serialize the **already-reconciled** consistency value (per "Consistency header reconciliation" above) into the appropriate RNTBD metadata token: `ConsistencyLevel` token if RCS resolved to `Default`, OR the `ReadConsistencyStrategy` token (`0x00F0`, Byte) if RCS resolved to a non-Default value. Emit exactly one of the two — never both. The serializer consumes the resolved value as input; do not re-run resolution here. +6. Serialize headers + body into RNTBD binary format (Phase 1) +7. POST RNTBD body to gateway 2.0 endpoint via HTTP/2 When `transport_mode != Gateway20`: Standard HTTP/REST request (existing flow, unchanged). @@ -310,7 +418,7 @@ EDIT src/driver/transport/transport_pipeline.rs — Branch on TransportMode in EDIT src/driver/transport/cosmos_headers.rs — Add gateway 2.0 header application EDIT src/driver/transport/mod.rs — Add is_operation_supported_by_gateway20() EDIT src/driver/pipeline/components.rs — Add EPK fields to TransportRequest if needed -EDIT src/driver/constants.rs (or NEW) — Relocate THINCLIENT_PROXY_* constants per §3.6-10 +EDIT src/driver/constants.rs (or NEW) — Relocate THINCLIENT_PROXY_* constants from azure_data_cosmos to azure_data_cosmos_driver (no SDK re-export, per §3.6-10) EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashed_partition_key_string callers with driver's EffectivePartitionKey::compute() ``` @@ -332,7 +440,7 @@ EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashe #### Region pairing (lock in the §PR #3942 decision) -Thin-client read locations pair **only** with read regions; thin-client write locations pair **only** with write regions. A write region that advertises no thin-client URL falls back to standard gateway **for writes** (this was deliberate in PR #3942: session retries that reroute reads to write endpoints would otherwise cross the read/write thin-client split). This is a correctness invariant — do not "fix" it by cross-pairing. +Gateway 2.0 read locations pair **only** with read regions; Gateway 2.0 write locations pair **only** with write regions. A write region that advertises no Gateway 2.0 URL falls back to standard gateway **for writes** (this was deliberate in PR #3942: session retries that reroute reads to write endpoints would otherwise cross the read/write Gateway 2.0 split). This is a correctness invariant — do not "fix" it by cross-pairing. #### Endpoint Discovery Flow (Existing) @@ -390,12 +498,12 @@ EDIT src/driver/pipeline/components.rs — Gateway 2.0 retry surface **Crate**: `azure_data_cosmos` -Gateway 2.0 is **not exposed as a customer-facing configuration**. The SDK automatically uses gateway 2.0 when the account metadata advertises thin client endpoints. This matches the design philosophy of both Java and .NET SDKs. +Gateway 2.0 is **on by default** when the account metadata advertises Gateway 2.0 endpoints; users opt **out** (not in) via `CosmosClientOptions::gateway20_disabled` if they need to. This matches the design philosophy of both Java and .NET SDKs and minimizes friction for the common case. #### What Will Be Done - **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations (per §3.4). No user opt-in required. -- **Internal kill switch** — `ConnectionPoolOptions.is_gateway20_allowed` and its env var (§3.4) remain internal. They are NOT exposed in `CosmosClientOptions` and are unsupported/undocumented. +- **Operator override** — `CosmosClientOptions::gateway20_disabled` (default `false`) is a public, documented setting for forcing standard-gateway routing per-client. **It carries an explicit warning that flipping it voids Gateway 2.0's latency SLA and impacts 24/7 Microsoft support eligibility for performance regressions.** Intentionally not exposed via env var. See §3.4 for the full normative wording. The legacy `ConnectionPoolOptions::is_gateway20_allowed` bring-up scaffolding is removed in Phase 5; `gateway20_disabled` is the single supported disablement mechanism. - **Diagnostics** — `CosmosDiagnostics` should report when a request used gateway 2.0 vs standard gateway (already partially done via `TransportKind::Gateway20`). - **User agent** — Update SDK user agent string to indicate gateway 2.0 capability. - **EPK cutover** — Replace SDK-side callers of `get_hashed_partition_key_string` with calls into the driver's `EffectivePartitionKey::compute()` / `::compute_range()` (this is the cutover PR #4087 flagged). Gateway 2.0 header injection depends on this being correct for hierarchical-PK containers. @@ -410,7 +518,7 @@ When account metadata includes `thinClientReadableLocations`, gateway 2.0 is ena EDIT src/driver_bridge.rs — Ensure internal config passes through EDIT src/handler/container_connection.rs — Route EPK through driver's EffectivePartitionKey::compute() EDIT src/partition_key.rs — Update feed_range_from_partition_key call site -EDIT src/constants.rs — Relocate / re-export header constants per §3.6-10 +EDIT src/constants.rs — Remove THINCLIENT_PROXY_* constants (relocated to driver crate, no SDK re-export, per §3.6-10) ``` --- @@ -421,13 +529,13 @@ Testing covers all layers from unit to E2E, matching or exceeding Java/.NET test #### Live Tests Pipeline -A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway 2.0 requires a Cosmos DB account with thin client endpoints enabled, which is separate from the standard emulator and live test infrastructure. +A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway 2.0 requires a Cosmos DB account with Gateway 2.0 endpoints enabled, which is separate from the standard emulator and live test infrastructure. **Trigger:** PR changes to `sdk/cosmos/**` + manual dispatch **Provision:** -- Use a **dedicated, pre-provisioned Cosmos DB account** with gateway 2.0 / thin client endpoints enabled (hardcoded for this pipeline, reused across runs) +- Use a **dedicated, pre-provisioned Cosmos DB account** with Gateway 2.0 endpoints enabled (hardcoded for this pipeline, reused across runs) - Account credentials stored in pipeline secrets (e.g., `AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`) - Multi-region configuration (at least 2 regions) - Verify `thinClientReadableLocations` in account metadata at pipeline start @@ -437,6 +545,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - Single-region gateway 2.0 - Multi-region gateway 2.0 with failover - Gateway 2.0 + standard gateway eligibility fallback (per-request only; normal retries still apply) +- Operator override (`CosmosClientOptions::gateway20_disabled = true`) — assert all eligible Document ops route through the standard gateway **Test Suites:** @@ -444,7 +553,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - Query (single-partition, cross-partition) - Batch operations - Change feed (LatestVersion) -- Retry scenarios (408, 410, 503) +- Retry scenarios (408, 410, 449, 503, 404/1002) - Diagnostics validation (`TransportKind::Gateway20`) **Artifacts:** Test results (JUnit XML), diagnostics logs, perf metrics (RU, latency) @@ -461,19 +570,32 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Test Category | Unit | Integration | E2E | Scenarios | | --- | --- | --- | --- | --- | | RNTBD serialization | Yes | | | Round-trip, edge cases, malformed input | +| RNTBD unknown-token tolerance | Yes | | | Inject synthetic unknown token IDs into a response frame; deserializer must skip + log, never panic / error / drop the rest of the response | | EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2, error cases (MultiHash V1, wrong component count) | | Operation filtering | Yes | | | All ResourceType × OperationType combos; asserts StoredProc Execute is rejected | | Header injection | Yes | | | Point vs feed EPK headers, proxy type headers, range-header un-padded form | +| Account-name + regional-account-name headers | Yes | | | Both `x-ms-thinclient-account-name` (account host label) and `x-ms-thinclient-regional-account-name` (`-` lowercase, matching the active `CosmosEndpoint` region) present on every Gateway 2.0 request (point, feed, batch, bulk, change feed). Multi-region client: assert regional value changes when the active endpoint switches regions. | +| SDK-supported-capabilities header | Yes | | | `x-ms-cosmos-sdk-supportedcapabilities` value emitted is the bitmask string for `(PartitionMerge \| IgnoreUnknownRntbdTokens)`, **not** `"0"`. Pin against the integer value sourced from .NET `SDKSupportedCapabilities.cs`. | +| HPK PK+EPK pairing (full-key gating) | Yes | | | Single-component container point op → emits both `x-ms-documentdb-partitionkey` and `x-ms-effective-partition-key`. HPK container full-key point op AND full-key single-logical-partition query → emits both. HPK container prefix-key request (component count < definition arity) → emits ONLY the EPK-range headers, NOT `x-ms-documentdb-partitionkey`. Cross-partition feed → emits neither PK header (only the range headers). | +| Consistency reconciliation: token + header encoding | Yes | | | RNTBD token `0x00F0` Byte round-trip for all 4 strategies; HTTP header `x-ms-cosmos-read-consistency-strategy` exact wire-string mapping for all 4 strategies; `Default` emits neither carrier on either transport. | +| Consistency reconciliation: dual-header rejection | Yes | | | SDK never emits both `x-ms-consistency-level` AND `x-ms-cosmos-read-consistency-strategy` on V1; never emits both `ConsistencyLevel` and `ReadConsistencyStrategy` RNTBD tokens on V2. Verified across all 16 (CL × RCS, request-level × client-level) combinations. | +| Consistency reconciliation: 4-source precedence | Yes | | | Request-RCS > Request-CL > Client-RCS > Client-CL > account default; `Default` at any RCS layer is a pass-through. Representative subset matching Java's data-provider tests. | +| Consistency reconciliation: GlobalStrong validation | Yes | | | RCS=GlobalStrong on a non-Strong account produces a fail-fast `azure_core::Error` (no wire request emitted); on a Strong account the request proceeds normally. | +| Consistency reconciliation: header-map immutability | Yes | | | Resolution does not mutate the operation's original request headers; an `applySessionToken`-equivalent rewrite cannot clobber `x-ms-consistency-level`. | +| Consistency reconciliation: write-op behavior | Yes | | | Write op + RCS set → RCS is ignored, `ConsistencyLevel` (if any) flows through on the selected transport. | | Gateway 2.0 transport | Yes | Yes | | Correct HTTP/2 config, sharded pool selection | -| Read/write pairing | Yes | | | Write region without thin-client falls back for writes only | +| Read/write pairing | Yes | | | Write region without Gateway 2.0 URL falls back for writes only | | Point CRUD | | | Yes | Create, read, replace, upsert, patch, delete | | Query | | | Yes | SQL query, cross-partition | | Batch | | | Yes | Transactional batch ops | | Bulk | | | Yes | Fan-out CRUD, distinct from Batch | | Change feed | | | Yes | LatestVersion, incremental | | Retry: 408 timeout | | Yes | | Cross-region for reads, local-only for writes | +| Retry: 449 Retry-With | | Yes | | Same Gateway 2.0 endpoint, standard backoff, no region switch, no fallback to standard gateway | | Retry: 503 | | Yes | | Regional failover via existing retry policies | | Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | +| Retry: 404 / sub-status 1002 (PartitionKeyRangeGone) | | Yes | | PKRange cache refresh + retry against **remote-preferred** region; assert local-region retry only when no other region available; assert PLF region wins when PLF has pinned the PKRangeId | +| Operator override (`gateway20_disabled = true`) | Yes | Yes | | All eligible Document ops (point + feed + batch + change feed) route through standard gateway; default `false` does not change behavior | | Eligibility fallback | | Yes | | StoredProc Execute → standard gateway | | PLF precedence | | Yes | | Region without gw20_url + PLF override → standard gateway path | | Multi-region failover | | Yes | Yes | Preferred regions, failover | @@ -505,5 +627,5 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway ## 5. Open Questions - **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails and the existing retry policies handle it. -- **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable gateway 2.0 / thin client endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned thin client account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. +- **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable Gateway 2.0 endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned Gateway 2.0 account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. - **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`THINCLIENT_RANGE_MIN`, `THINCLIENT_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. From 5ed52469b72651bc1928fd0264d4015e338530e7 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Sun, 26 Apr 2026 19:17:55 -0700 Subject: [PATCH 15/48] Scrub 'footgun' from Gateway 2.0 spec Replace 'footgun' with 'pitfall' in two prose locations (the Java header-mutation hazard heading and the Range semantics blockquote) and drop 'footgun' from the inline cspell:ignore directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 847125647c8..0d83b9f0877 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,4 +1,4 @@ - + # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating @@ -383,7 +383,7 @@ The compute gateway rejects requests that carry both `x-ms-consistency-level` AN When the resolved RCS is `GlobalStrong` and the account default consistency is **not** `Strong`, the driver MUST fail the operation **before** transport selection / serialization with a `BadRequestException`-equivalent (Rust: `azure_core::Error` with the appropriate `ErrorKind`). This avoids a wasted round-trip and matches Java's fail-fast semantics. The check uses the cached account properties already maintained by the driver; no additional metadata fetch is required. -##### Implementation footgun (Java bug class to avoid) +##### Implementation pitfall (Java bug class to avoid) Resolution MUST NOT mutate the request's header map in place. The Java fix in `RxGatewayStoreModel.applySessionToken()` switched to a header-map copy because the prior code's mutation rewrote `x-ms-consistency-level` (e.g., `LatestCommitted` was rewritten to `BoundedStaleness`); the gateway then rejected the request because `BoundedStaleness` was stricter than the Session account default. Even though the underlying conflict was real, the diagnostic was unrecoverable because the original headers had already been clobbered. @@ -393,7 +393,7 @@ For Rust: thread the resolved consistency value through the pipeline as an expli EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `EffectivePartitionKey`'s `Ord` / `cmp` impl, which correctly handles the mixed-length boundaries returned by the backend; the `epk_cmp_*` tests in `container_routing_map.rs` (around L625–665) pin this behavior. The comparator is consumed via `binary_search_by(|r| r.min_inclusive.cmp(&epk_val))` (≈L282 of the same file). `@analogrelay`'s earlier zero-padding proposal in PR #4087 (commit `25233c903`) was **not** adopted; stay consistent with the length-aware convention. -> **`Range` semantics footgun** (from PR #4087): `compute_range` returns a Rust `std::ops::Range` where `start == end` denotes a **point operation**. Standard `Range` iteration treats that as empty, so code that uses `.contains()` or iterates the range directly will misbehave. Always treat `start == end` as the point case explicitly. +> **`Range` semantics pitfall** (from PR #4087): `compute_range` returns a Rust `std::ops::Range` where `start == end` denotes a **point operation**. Standard `Range` iteration treats that as empty, so code that uses `.contains()` or iterates the range directly will misbehave. Always treat `start == end` as the point case explicitly. #### Gateway 2.0 Header Injection Flow From 766954060602c05067452e0ba9b3e7bd601680c8 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 06:45:34 -0700 Subject: [PATCH 16/48] Remove internal ADO PR references from spec Drop two mentions of internal ADO PR #2031635 from the ReadConsistencyStrategy section. Java PR #48787 and .NET PR #5685 remain as the public cross-SDK references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 0d83b9f0877..146d5bd3adc 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -347,7 +347,7 @@ The Cosmos SDK exposes two consistency knobs that can both target the same read - **`ConsistencyLevel`** — per-request override of the account default consistency. - **`ReadConsistencyStrategy`** (defined in `azure_data_cosmos_driver::options::read_consistency`) — read-only strategy override (`Default`, `Eventual`, `Session`, `LatestCommitted`, `GlobalStrong`); supersedes `ConsistencyLevel` on reads. -This subsection is the Rust mirror of the cross-SDK design landed in [Java PR #48787](https://github.com/Azure/azure-sdk-for-java/pull/48787) (with .NET parity in PR #5685 and proxy-side changes coordinated via internal ADO PR #2031635). Wire-format and resolution semantics MUST match Java/.NET so that a single proxy-side validation suite is sufficient. +This subsection is the Rust mirror of the cross-SDK design landed in [Java PR #48787](https://github.com/Azure/azure-sdk-for-java/pull/48787) (with .NET parity in PR #5685 and matching proxy-side changes). Wire-format and resolution semantics MUST match Java/.NET so that a single proxy-side validation suite is sufficient. ##### Wire carriers @@ -356,7 +356,7 @@ This subsection is the Rust mirror of the cross-SDK design landed in [Java PR #4 | Standard Gateway (V1, HTTP) | HTTP request header `x-ms-cosmos-read-consistency-strategy` (per Java `HttpConstants.READ_CONSISTENCY_STRATEGY`) | String, exact case-sensitive values: `"Eventual"`, `"Session"`, `"LatestCommitted"`, `"GlobalStrong"`. Header is omitted entirely when the resolved RCS is `Default`. | | Gateway 2.0 (RNTBD) | RNTBD metadata token ID `0x00F0` | **Byte** type — `Eventual = 0x01`, `Session = 0x02`, `LatestCommitted = 0x03`, `GlobalStrong = 0x04`. The token MUST be Byte-encoded; per the Java PR an earlier String-typed prototype caused the proxy to hang. The token is omitted entirely when the resolved RCS is `Default`. | -The byte values are pinned against the proxy's C++ enum (proxy ADO PR #2031635). Phase 1's RNTBD token catalog grows a row for `ReadConsistencyStrategy = 0x00F0 (Byte)` enumerating the four byte values. +The byte values are pinned against the proxy's C++ enum. Phase 1's RNTBD token catalog grows a row for `ReadConsistencyStrategy = 0x00F0 (Byte)` enumerating the four byte values. ##### Resolution precedence From c3ba85614d62e82e73c0b151c8d2774c64d96328 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 07:51:32 -0700 Subject: [PATCH 17/48] Resolve EPK range type consolidation in spec Convert open-question item 11 from a deferred audit deliverable into a concrete decision. Audit results documented inline: - Driver crate is canonical: EpkRange, PartitionKeyRange (typed EffectivePartitionKey bounds), and the EffectivePartitionKey newtype with compute_range(). - SDK-crate analogs (routing::range::Range, routing::partition_key_range, hash::EffectivePartitionKey) are NOT used on the Gateway 2.0 path and remain only for legacy non-Gateway-2.0 SDK callers. Phase 2 EPK header injection MUST reuse the driver-crate types and MUST NOT introduce a new EPK-range struct or depend on any SDK-crate analog. Consistent with item 10 (no SDK Gateway-2.0 surface). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 146d5bd3adc..d778e3a7ef3 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -145,7 +145,18 @@ Invariants this spec locks in: 8. **Integration/E2E tests** — No gateway 2.0 test coverage beyond the routing-systems unit tests 9. **Fault injection** — No gateway 2.0 fault injection scenarios 10. **Constants cross-crate visibility** — _Resolved_. Per PR review (analogrelay): the SDK has no Gateway-2.0 surface area whatsoever. `THINCLIENT_PROXY_*`, `THINCLIENT_RANGE_MIN/MAX`, and Gateway-2.0-specific header constants live exclusively in `azure_data_cosmos_driver::constants`; **no SDK re-export**. The SDK calls the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs standard gateway internally. The legacy `START_EPK` / `END_EPK` constants in `azure_data_cosmos::constants` remain for any non-Gateway-2.0 callers but are not used on the Gateway 2.0 path. Phase 2 deliverable includes the move. -11. **EPK Range type consolidation** — There appear to be multiple `EpkRange` / `PartitionKeyRange` / EPK-bound representations across `azure_data_cosmos` and `azure_data_cosmos_driver`. **Pre-Phase-2 audit deliverable**: enumerate every EPK-range-shaped struct in both crates, document overlap, and pick one canonical representation. Phase 2's EPK header injection MUST reuse the chosen canonical type — it must not introduce a new EPK-range type. Track in PR review of the Phase 2 implementation. +11. **EPK Range type consolidation** — _Resolved_. Audit results across both crates: + + | Crate | Type | Verdict | + | --- | --- | --- | + | `azure_data_cosmos_driver::models::range::EpkRange` | Generic range with `min` / `max` / `is_min_inclusive` / `is_max_inclusive`, plus `contains` / `is_empty` / `check_overlapping` / `Display` (`[a,b)` form) | **Canonical** for typed EPK ranges | + | `azure_data_cosmos_driver::models::partition_key_range::PartitionKeyRange` | Service model with `min_inclusive: EffectivePartitionKey` / `max_exclusive: EffectivePartitionKey` and full PKR metadata (id, rid, parents, throughput, status, lsn) | **Canonical** for cached PKR entries | + | `azure_data_cosmos_driver::models::effective_partition_key::EffectivePartitionKey` | Strongly-typed EPK newtype with `compute_range()` returning `std::ops::Range` | **Canonical** EPK value type | + | `azure_data_cosmos::routing::range::Range` | SDK-side generic range | **Not used** on the Gateway 2.0 path — legacy, kept only for non-Gateway-2.0 SDK callers | + | `azure_data_cosmos::routing::partition_key_range::PartitionKeyRange` | SDK-side PKR with `min_inclusive: String` / `max_exclusive: String` (untyped) | **Not used** on the Gateway 2.0 path | + | `azure_data_cosmos::hash::EffectivePartitionKey` | SDK-side `EffectivePartitionKey(String)` newtype, distinct from the driver's | **Not used** on the Gateway 2.0 path | + + **Decision**: every Gateway 2.0 EPK-range representation lives in the **driver crate**. Phase 2's EPK header injection MUST consume `EffectivePartitionKey::compute_range()` directly and serialize through the driver crate's existing types; it MUST NOT introduce a new EPK-range struct, and MUST NOT depend on any of the SDK-crate analogs. Per item 10 there is no SDK Gateway-2.0 surface, so the SDK-crate types stay untouched and unreferenced on this path. Phase 2 PR review enforces both rules. 12. **Gateway 2.0 retry behavior for region-routed status codes** — Beyond the timeout / 408 handling already deferred to `TRANSPORT_PIPELINE_SPEC.md`, the Gateway 2.0 path inherits these region-aware retry rules from the standard pipeline (no Gateway-2.0-specific override needed): - **HTTP 449 (Retry-With)** — Retry against the **same** Gateway 2.0 endpoint with the standard backoff schedule. **Do not** switch regions on 449. **Do not** fall back to standard gateway on 449 — the proxy is healthy; the backend asked for a retry. - **HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`)** — Refresh the PKRange cache, then retry. **Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. From 85a559b87a09dfcd81c46f3dcea31ed62c03e3a3 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 09:06:34 -0700 Subject: [PATCH 18/48] Remove HPK PK-header gating prose from spec Drop the BaseProxyClientHttpMessageHandler attribution paragraph and the HPK full-key gating bullet section (single-component vs hierarchical, full-key vs prefix-key emission rules), and remove the matching cross-references downstream: - Header table row for x-ms-documentdb-partitionkey: simplified to describe co-emission with x-ms-effective-partition-key on point / single-logical-partition ops, without the prefix-key gating clause. - Header injection flow step 4: trimmed to just emit the EPK header; PK-header gating clause removed. - Test matrix: dropped the 'HPK PK+EPK pairing (full-key gating)' row. The spec no longer prescribes any HPK-specific PK-header gating; that behavior can be re-introduced in a follow-up if needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index d778e3a7ef3..c8f0e7a51e5 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -322,7 +322,7 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They | `x-ms-thinclient-account-name` | **NEW** — `THINCLIENT_ACCOUNT_NAME` (driver) | Global database account name (e.g., `myacct` from `myacct.documents.azure.com`); region-independent tenant identity. Source: .NET `BaseProxyClientHttpMessageHandler.AccountName` (`/Product/SDK/.net/Microsoft.Azure.Cosmos.Friends/src/BaseProxyClientHttpMessageHandler.cs:20`); value matches `GlobalDatabaseAccountName` (compute-gateway side: `SqlApiOperationHandler.cs:1135`). | Every Gateway 2.0 request | | `x-ms-thinclient-regional-account-name` | **NEW** — `THINCLIENT_REGIONAL_ACCOUNT_NAME` (driver) | Region-stamped document-service identity, format `-` lowercase (e.g., `myacct-eastus`). Source: .NET `BaseProxyClientHttpMessageHandler.RegionalAccountName` (`BaseProxyClientHttpMessageHandler.cs:22`); value matches `DocumentServiceId`; region-format derivation matches `AdminEndpointActions.cs:6236-6237`. | Every Gateway 2.0 request | | `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | -| `x-ms-documentdb-partitionkey` | existing `PARTITION_KEY` constant (SDK) | JSON-encoded partition-key value | Point ops AND single-logical-partition query ops — emitted **alongside** `x-ms-effective-partition-key` **only when the request carries the full partition-key value** (see HPK note below). For HPK containers scoped to a prefix of the partition-key definition, this header is **omitted** and only the EPK / EPK-range headers are sent. | +| `x-ms-documentdb-partitionkey` | existing `PARTITION_KEY` constant (SDK) | JSON-encoded partition-key value | Point ops AND single-logical-partition query ops, alongside `x-ms-effective-partition-key` | | `x-ms-thinclient-range-min` | **NEW** — `THINCLIENT_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | | `x-ms-thinclient-range-max` | **NEW** — `THINCLIENT_RANGE_MAX` (driver) | Upper bound of EPK range | Feed / cross-partition ops only | | `x-ms-cosmos-use-thinclient` | **NEW** (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | @@ -339,18 +339,6 @@ Source values: - Account-name = the host label of the account endpoint URL (the `myacct` portion of `myacct.documents.azure.com`), parsed once at client construction. - Regional-account-name = `-` lowercase, where `region-name` is the location string of the target `CosmosEndpoint` (e.g., `myacct-eastus`). The Rust driver's per-`CosmosEndpoint` region context (already maintained by the PLF / preferred-region machinery) is the natural source. -Both headers mirror .NET's `BaseProxyClientHttpMessageHandler` (the shared base class used by `ThinClientHttpMessageHandler`, `DqsClientHttpMessageHandler`, and `DtcClientHttpMessageHandler` — i.e., every proxy path emits both unconditionally). - -**PK header alongside EPK (HPK full-key gating)**: the driver emits `x-ms-documentdb-partitionkey` (the raw, JSON-encoded partition-key value) **only when the operation carries the full partition-key value** — i.e., the number of components supplied equals the container's partition-key definition arity. This applies to both point operations and single-logical-partition query operations. - -- **Single-component (non-HPK) containers**: every point op and every single-logical-partition query supplies the full PK by definition, so the header is always emitted alongside `x-ms-effective-partition-key`. -- **Hierarchical (multi-component, HPK) containers**: - - Full-key request (component count == definition arity) → emit BOTH `x-ms-documentdb-partitionkey` AND `x-ms-effective-partition-key`. The proxy can use the raw PK to skip recomputing EPK and to choose finer-grained replica selection than EPK alone allows. - - Prefix-key request (component count < definition arity) → emit ONLY the EPK-range carriers (`x-ms-thinclient-range-min` / `-max`). Do **not** emit `x-ms-documentdb-partitionkey` with a partial value, because the proxy treats that header as the canonical full PK and a partial value would route incorrectly. -- **Cross-partition feed / query ops**: continue to emit only the EPK range headers — no PK header on the feed path regardless of HPK arity. - -Gating is decided at header-injection time using the partition-key definition (already cached on the container) and the operation's supplied PK component count; no runtime computation cost beyond a length compare. - #### Consistency header reconciliation (`ConsistencyLevel` ↔ `ReadConsistencyStrategy`) The Cosmos SDK exposes two consistency knobs that can both target the same read operation: @@ -413,7 +401,7 @@ When `transport_mode == Gateway20`: 1. Set `x-ms-thinclient-proxy-operation-type` (numeric operation type) 2. Set `x-ms-thinclient-proxy-resource-type` (numeric resource type) 3. Set `x-ms-thinclient-account-name` (account host label) **and** `x-ms-thinclient-regional-account-name` (`-` lowercase, sourced from the active `CosmosEndpoint`'s region) — every request, see "Account-name + regional-account-name headers" note above -4. Point op or single-logical-partition query op? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`); additionally set `x-ms-documentdb-partitionkey` (JSON-encoded PK value) **only when the supplied PK component count equals the container's partition-key definition arity** (full-key gating, see HPK note above). For HPK prefix-key requests, omit the PK header. +4. Point op or single-logical-partition query op? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`). Cross-partition feed / query operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` (from `EffectivePartitionKey::compute_range()`); do **not** emit the PK header on the feed path. 5. Serialize the **already-reconciled** consistency value (per "Consistency header reconciliation" above) into the appropriate RNTBD metadata token: `ConsistencyLevel` token if RCS resolved to `Default`, OR the `ReadConsistencyStrategy` token (`0x00F0`, Byte) if RCS resolved to a non-Default value. Emit exactly one of the two — never both. The serializer consumes the resolved value as input; do not re-run resolution here. 6. Serialize headers + body into RNTBD binary format (Phase 1) @@ -587,7 +575,6 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Header injection | Yes | | | Point vs feed EPK headers, proxy type headers, range-header un-padded form | | Account-name + regional-account-name headers | Yes | | | Both `x-ms-thinclient-account-name` (account host label) and `x-ms-thinclient-regional-account-name` (`-` lowercase, matching the active `CosmosEndpoint` region) present on every Gateway 2.0 request (point, feed, batch, bulk, change feed). Multi-region client: assert regional value changes when the active endpoint switches regions. | | SDK-supported-capabilities header | Yes | | | `x-ms-cosmos-sdk-supportedcapabilities` value emitted is the bitmask string for `(PartitionMerge \| IgnoreUnknownRntbdTokens)`, **not** `"0"`. Pin against the integer value sourced from .NET `SDKSupportedCapabilities.cs`. | -| HPK PK+EPK pairing (full-key gating) | Yes | | | Single-component container point op → emits both `x-ms-documentdb-partitionkey` and `x-ms-effective-partition-key`. HPK container full-key point op AND full-key single-logical-partition query → emits both. HPK container prefix-key request (component count < definition arity) → emits ONLY the EPK-range headers, NOT `x-ms-documentdb-partitionkey`. Cross-partition feed → emits neither PK header (only the range headers). | | Consistency reconciliation: token + header encoding | Yes | | | RNTBD token `0x00F0` Byte round-trip for all 4 strategies; HTTP header `x-ms-cosmos-read-consistency-strategy` exact wire-string mapping for all 4 strategies; `Default` emits neither carrier on either transport. | | Consistency reconciliation: dual-header rejection | Yes | | | SDK never emits both `x-ms-consistency-level` AND `x-ms-cosmos-read-consistency-strategy` on V1; never emits both `ConsistencyLevel` and `ReadConsistencyStrategy` RNTBD tokens on V2. Verified across all 16 (CL × RCS, request-level × client-level) combinations. | | Consistency reconciliation: 4-source precedence | Yes | | | Request-RCS > Request-CL > Client-RCS > Client-CL > account default; `Default` at any RCS layer is a pass-through. Representative subset matching Java's data-provider tests. | From c31bcf770ea7b39fa300854bb295e0842bdcd3a5 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 09:24:09 -0700 Subject: [PATCH 19/48] Use GlobalDatabaseAccountName RNTBD token for tenant identity Replace the two Gateway-2.0-specific HTTP headers (x-ms-thinclient-account-name, x-ms-thinclient-regional-account-name) with the existing RNTBD GlobalDatabaseAccountName token (0x00CE, String, optional), carried in the RNTBD metadata stream on every Gateway 2.0 request. - Headers table: dropped both -account-name and -regional-account-name rows. - Prose: replaced the proxy-tenant-routing section with a brief Tenant-identification note pointing to the RNTBD token. - Injection flow step 3: now serializes the RNTBD token instead of setting two HTTP headers. - Test matrix: row reframed to assert presence of the RNTBD token in the request metadata stream. This drops the regional-account-name carrier entirely; the proxy uses the global account-name only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index c8f0e7a51e5..52c7757335c 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -319,8 +319,6 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They | --- | --- | --- | --- | | `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (driver) | Numeric operation type | Every Gateway 2.0 request | | `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (driver) | Numeric resource type | Every Gateway 2.0 request | -| `x-ms-thinclient-account-name` | **NEW** — `THINCLIENT_ACCOUNT_NAME` (driver) | Global database account name (e.g., `myacct` from `myacct.documents.azure.com`); region-independent tenant identity. Source: .NET `BaseProxyClientHttpMessageHandler.AccountName` (`/Product/SDK/.net/Microsoft.Azure.Cosmos.Friends/src/BaseProxyClientHttpMessageHandler.cs:20`); value matches `GlobalDatabaseAccountName` (compute-gateway side: `SqlApiOperationHandler.cs:1135`). | Every Gateway 2.0 request | -| `x-ms-thinclient-regional-account-name` | **NEW** — `THINCLIENT_REGIONAL_ACCOUNT_NAME` (driver) | Region-stamped document-service identity, format `-` lowercase (e.g., `myacct-eastus`). Source: .NET `BaseProxyClientHttpMessageHandler.RegionalAccountName` (`BaseProxyClientHttpMessageHandler.cs:22`); value matches `DocumentServiceId`; region-format derivation matches `AdminEndpointActions.cs:6236-6237`. | Every Gateway 2.0 request | | `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | | `x-ms-documentdb-partitionkey` | existing `PARTITION_KEY` constant (SDK) | JSON-encoded partition-key value | Point ops AND single-logical-partition query ops, alongside `x-ms-effective-partition-key` | | `x-ms-thinclient-range-min` | **NEW** — `THINCLIENT_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | @@ -329,15 +327,7 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They Per Q3 resolution, the Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (it does **not** accept `x-ms-start-epk` / `x-ms-end-epk`). Phase 2 introduces the new constants above; the existing `START_EPK` / `END_EPK` constants are not emitted on the Gateway 2.0 path. -**Account-name + regional-account-name headers (proxy tenant routing)**: both `x-ms-thinclient-account-name` and `x-ms-thinclient-regional-account-name` are emitted on **every** Gateway 2.0 request (point, feed, batch, bulk, change feed, etc.) — this is the established proxy contract today, not future-proofing. The proxy uses the two headers in tandem: - -- **Account-name** identifies the tenant (which Cosmos account this request belongs to) so the proxy can look up the correct backend federation. Region-independent. -- **Regional-account-name** pins the request to a specific regional document-service / compute-gateway federation. This is required because globally-distributed accounts have one federation per region — writes must land in the write region while reads can stay local — so the proxy must know not just *which account* but also *which physical regional federation* to route to without re-resolving on every request. - -Source values: - -- Account-name = the host label of the account endpoint URL (the `myacct` portion of `myacct.documents.azure.com`), parsed once at client construction. -- Regional-account-name = `-` lowercase, where `region-name` is the location string of the target `CosmosEndpoint` (e.g., `myacct-eastus`). The Rust driver's per-`CosmosEndpoint` region context (already maintained by the PLF / preferred-region machinery) is the natural source. +**Tenant identification (RNTBD token, not HTTP header)**: the proxy identifies the target Cosmos account from the existing RNTBD `GlobalDatabaseAccountName` token (`0x00CE`, `String`, optional) carried inside the RNTBD metadata stream on **every** Gateway 2.0 request. No Gateway-2.0-specific HTTP headers are introduced for account or regional-account identification — the RNTBD token is the canonical carrier and matches the Java/.NET wire contract. The value is the global database account name (e.g., `myacct` from `myacct.documents.azure.com`), parsed once from the account endpoint URL at client construction. #### Consistency header reconciliation (`ConsistencyLevel` ↔ `ReadConsistencyStrategy`) @@ -400,7 +390,7 @@ When `transport_mode == Gateway20`: 1. Set `x-ms-thinclient-proxy-operation-type` (numeric operation type) 2. Set `x-ms-thinclient-proxy-resource-type` (numeric resource type) -3. Set `x-ms-thinclient-account-name` (account host label) **and** `x-ms-thinclient-regional-account-name` (`-` lowercase, sourced from the active `CosmosEndpoint`'s region) — every request, see "Account-name + regional-account-name headers" note above +3. Serialize the `GlobalDatabaseAccountName` RNTBD metadata token (`0x00CE`, `String`, optional) with the account host label (e.g., `myacct`) — every request, see "Tenant identification" note above 4. Point op or single-logical-partition query op? Set `x-ms-effective-partition-key` (EPK hash from `EffectivePartitionKey::compute()`). Cross-partition feed / query operation? Set `x-ms-thinclient-range-min` and `x-ms-thinclient-range-max` (from `EffectivePartitionKey::compute_range()`); do **not** emit the PK header on the feed path. 5. Serialize the **already-reconciled** consistency value (per "Consistency header reconciliation" above) into the appropriate RNTBD metadata token: `ConsistencyLevel` token if RCS resolved to `Default`, OR the `ReadConsistencyStrategy` token (`0x00F0`, Byte) if RCS resolved to a non-Default value. Emit exactly one of the two — never both. The serializer consumes the resolved value as input; do not re-run resolution here. @@ -573,7 +563,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2, error cases (MultiHash V1, wrong component count) | | Operation filtering | Yes | | | All ResourceType × OperationType combos; asserts StoredProc Execute is rejected | | Header injection | Yes | | | Point vs feed EPK headers, proxy type headers, range-header un-padded form | -| Account-name + regional-account-name headers | Yes | | | Both `x-ms-thinclient-account-name` (account host label) and `x-ms-thinclient-regional-account-name` (`-` lowercase, matching the active `CosmosEndpoint` region) present on every Gateway 2.0 request (point, feed, batch, bulk, change feed). Multi-region client: assert regional value changes when the active endpoint switches regions. | +| Account-name RNTBD token | Yes | | | `GlobalDatabaseAccountName` (`0x00CE`, `String`) present in the RNTBD metadata stream of every Gateway 2.0 request (point, feed, batch, bulk, change feed). Value matches the host label of the account endpoint URL. | | SDK-supported-capabilities header | Yes | | | `x-ms-cosmos-sdk-supportedcapabilities` value emitted is the bitmask string for `(PartitionMerge \| IgnoreUnknownRntbdTokens)`, **not** `"0"`. Pin against the integer value sourced from .NET `SDKSupportedCapabilities.cs`. | | Consistency reconciliation: token + header encoding | Yes | | | RNTBD token `0x00F0` Byte round-trip for all 4 strategies; HTTP header `x-ms-cosmos-read-consistency-strategy` exact wire-string mapping for all 4 strategies; `Default` emits neither carrier on either transport. | | Consistency reconciliation: dual-header rejection | Yes | | | SDK never emits both `x-ms-consistency-level` AND `x-ms-cosmos-read-consistency-strategy` on V1; never emits both `ConsistencyLevel` and `ReadConsistencyStrategy` RNTBD tokens on V2. Verified across all 16 (CL × RCS, request-level × client-level) combinations. | From 72f4ab9d0a6a169ac04832659526d1b6f63476d2 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 11:22:54 -0700 Subject: [PATCH 20/48] Address round-7 PR feedback on Gateway 2.0 spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename THINCLIENT_PROXY_* Rust constants to GATEWAY20_* family (wire header strings remain x-ms-thinclient-* server-defined) - 449 retry policy now explicitly exponential backoff (separate budget from 410/Gone) - Strengthen positive-term ban: forbid is_gateway20_allowed / gateway20_allowed / enable_gateway20 anywhere in driver, SDK, perf crate, env vars, or test wiring; only negative-term names permitted - Restore (§3.5) reference in EPK-computation bullet per reviewer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 180 ++++++++---------- 1 file changed, 80 insertions(+), 100 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 52c7757335c..ffc00d4b09b 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,4 +1,4 @@ - + # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating @@ -11,9 +11,10 @@ 1. [Overview](#1-overview) 2. [Motivation](#2-motivation) -3. [Current Rust State](#3-current-rust-state) -4. [Rust Implementation Plan](#4-rust-implementation-plan) -5. [Open Questions](#5-open-questions) +3. [Gating, Configuration & Override](#3-gating-configuration--override) +4. [Retry Behavior](#4-retry-behavior) +5. [Rust Implementation Plan](#5-rust-implementation-plan) +6. [Open Questions](#6-open-questions) ### Related Specs @@ -25,9 +26,13 @@ ## 1. Overview -Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clients to route data-plane operations through a lightweight proxy endpoint instead of directly to backend replicas. It uses RNTBD binary serialization over the HTTP/2 protocol, with the proxy handling partition routing, replica selection, and load balancing. +Gateway 2.0 (formerly "thin client") is a server-side proxy that evolves the existing Gateway V1 path: SDK clients still target a regional proxy endpoint instead of opening connections to backend replicas, but the proxy uses RNTBD binary message encoding over HTTP/2 in place of REST/JSON, and the proxy itself owns replica selection, connection management, and load balancing within a partition. -**Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Avoid "thin client" except when referencing Java/.NET code or existing constants (`THINCLIENT_*`). +**RNTBD** stands for "Real Name to be Determined" — the (intentionally placeholder) code name for the proprietary message-encoding and wire-protocol format originally introduced for direct mode. Gateway 2.0 keeps the **message encoding** layer of RNTBD and moves the **wire protocol** to HTTP/2; direct mode used RNTBD over TCP. + +Direct mode was never in scope for the Rust SDK, so the rest of this spec compares Gateway 2.0 to **Gateway V1**, not to direct mode. + +**Naming**: Use "Gateway 2.0" consistently in all Rust code, docs, and comments. Reserve "thin client" for two narrow uses: (a) when referencing Java/.NET source symbols (`ThinClientHttpMessageHandler`, etc.), and (b) the literal wire-header names that contain `thinclient` (e.g. `x-ms-thinclient-proxy-operation-type`). --- @@ -35,19 +40,16 @@ Gateway 2.0 (formerly "thin client") is a server-side proxy that allows SDK clie ### Why Gateway 2.0? -Traditional Cosmos DB offers two connection modes: +Today the Rust SDK only supports Gateway V1: a shared, stateless HTTP/REST proxy that adds a network hop and provides **no latency SLA**. (Direct mode — where an SDK opens TCP connections directly to backend replicas — was never implemented for Rust because it introduces operational cost: COGS for replica connections, plus debugging complexity for customers when network paths to backend nodes break. It is not in scope.) -- **Gateway mode**: Simple HTTP/REST proxy — easy to use, but adds an extra network hop through a shared stateless gateway. No latency SLA guarantees because the gateway is a shared, best-effort proxy. -- **Direct mode**: SDK connects directly to backend replicas via TCP/RNTBD — provides latency SLA guarantees but requires the SDK to manage replica discovery, connection pooling, and partition routing itself. This adds significant complexity to the SDK and requires direct network access to backend nodes. - -**Gateway 2.0 bridges this gap.** It is a gateway mode with SLA latency guarantees — combining the operational simplicity of gateway mode (single endpoint, no direct backend connectivity required, firewall-friendly) with the performance characteristics of direct mode (RNTBD binary protocol, server-side partition routing, replica-aware load balancing). +**Gateway 2.0 bridges this gap.** It keeps the operational shape of Gateway V1 (one regional endpoint, no direct backend connectivity required) and adds the performance characteristics of direct mode (RNTBD binary message encoding, server-side replica selection, replica-aware load balancing) — so customers get latency SLAs without taking on the operational burden direct mode imposed. ### Key Benefits -- **SLA latency guarantees** — Unlike traditional gateway, Gateway 2.0 plans to provide contractual latency commitments comparable to direct mode -- **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTPS; no need to open firewall rules to individual backend replicas -- **Reduced SDK complexity** — The proxy handles replica discovery, connection management, and replica-level routing within a partition; the SDK only needs RNTBD serialization, partition-level routing (PKRange resolution / EPK computation), and endpoint selection -- **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection, reducing connection overhead vs. direct mode's per-replica TCP connections +- **SLA latency guarantees** — Unlike Gateway V1, Gateway 2.0 plans to provide contractual latency commitments comparable to direct mode +- **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTP/2 on a non-443 port (e.g. 1125). The endpoint IP set is dynamic and may shift across Gateway 2.0 federations; firewall rules MUST scope by hostname/port, not by specific IPs. +- **Reduced operational cost** — The proxy handles replica discovery, connection management, and replica-level routing within a partition; the client stays partition-aware (resolves PKRange, computes EPK) but is **never replica-aware**, avoiding the COGS and customer-side debugging burden of maintaining one connection per backend replica. +- **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection. Note that Gateway V1 also supports HTTP/2 with multiplexing today; the connection-overhead win is relative to direct mode's per-replica TCP connections, not relative to Gateway V1. - **Transparent failover** — The proxy handles replica failover within a partition; the SDK handles regional failover across proxy endpoints ### Design Philosophy @@ -70,118 +72,94 @@ Gateway 2.0 moves **replica-level** routing intelligence from the SDK into the s ### Connection Mode Comparison -| Aspect | Gateway | Gateway 2.0 | Direct | +| Aspect | Gateway V1 | Gateway 2.0 | Direct (not in scope for Rust) | | --- | --- | --- | --- | | Latency SLA | No | **Yes** | Yes | | Simple Network | Yes | Yes | No | -| Protocol | REST/HTTP | RNTBD/HTTP2 | RNTBD/TCP | +| Protocol | REST/HTTP, HTTP/2 + multiplexing supported | RNTBD message encoding over HTTP/2 | RNTBD over TCP | | Replica Mgmt | Gateway/Proxy | Proxy | SDK | | Partition Route | Gateway/Proxy | Proxy | SDK | | Regional Route | SDK | SDK | SDK | -| SDK Complexity | Medium | Medium | High | -| Firewall Rules | 1 endpoint | 1 endpoint | N replicas | +| Operational Cost (COGS + debug) | Low | Low | High | +| Firewall Rules | 1 endpoint (443) | 1 endpoint (non-443, e.g. 1125; dynamic IPs) | N replicas | --- -## 3. Current Rust State +## 3. Gating, Configuration & Override + +Gateway 2.0 routing is decided **once per request** in the driver's `resolve_endpoint` stage. After that, downstream pipeline stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. -The Rust driver (`azure_data_cosmos_driver`) already has significant gateway 2.0 scaffolding. +### 3.1 The `prefer_gateway20` formula -### 3.1 Already Implemented — endpoint & transport +```text +prefer_gateway20 = !options.gateway20_disabled + && account.has_thin_client_endpoints() +``` + +The account-side check (`has_thin_client_endpoints()`) reads the cached account metadata. The client-side check (`gateway20_disabled`) is the only public toggle. -- **`CosmosEndpoint`** — `gateway20_url: Option` field, `regional_with_gateway20()`, `uses_gateway20()`, `selected_url()` methods -- **`TransportMode::Gateway20`** enum variant in pipeline components -- **`RoutingDecision`** — carries `transport_mode` that distinguishes gateway vs gateway 2.0 -- **`ConnectionPoolOptions`** — `is_gateway20_allowed: bool` config (see §3.4 for gating model) -- **`CosmosTransport`** — `dataplane_gateway20_transport: OnceLock`, lazy init with `AdaptiveTransport::gateway20()` -- **`AdaptiveTransport::ShardedGateway20`** variant — HTTP/2 only with prior knowledge (no HTTP/1.x fallback; the proxy does not accept HTTP/1.x — see Open Question Q1, resolved) -- **`HttpClientConfig::dataplane_gateway20()`** — HTTP/2-only config; HTTP/2 negotiation failure surfaces as a transport error (handled by the existing retry policies) rather than downgrading -- **`TransportKind::Gateway20`** in diagnostics +### 3.2 Operator override: `CosmosClientOptions::gateway20_disabled` -### 3.2 Already Implemented — account metadata & routing +Default `false`. Customers and operators MAY set `gateway20_disabled = true` on `CosmosClientOptions` to force every request from the client to route through Gateway V1, even when the account advertises Gateway 2.0 endpoints and the operation would otherwise be eligible. -- **`LocationStateStore`** — `gateway20_enabled` flag, passes through to endpoint construction -- **`AccountProperties::has_thin_client_endpoints()`** (`account_metadata_cache.rs:191`) — detection helper -- **`AccountProperties::thin_client_writable_regions()` / `thin_client_readable_regions()`** (`account_metadata_cache.rs:197,205`) — region accessors -- **`parse_thin_client_locations()`** — parser for `thinClient(Readable|Writable)Locations` -- **`build_account_endpoint_state()`** (`routing_systems.rs`) — resolves gateway 2.0 URLs from account properties -- **Existing tests** in `routing_systems.rs:218–289` already exercise GW20 endpoint construction with both readable and writable thin-client locations -- **`resolve_endpoint()`** in operation pipeline — selects gateway 2.0 URL when `prefer_gateway20` is true (see §3.4) +⚠️ **Setting this flag voids the latency-SLA story Gateway 2.0 is being built to deliver. It also impacts the ability to receive 24/7 Microsoft support for performance regressions on this client. Use only when explicitly directed by Microsoft Support during incident triage.** The flag is intentionally **not** exposed via environment variable to discourage casual / fleet-wide enablement; operators who need it must opt in per-client through code. -### 3.3 Already Implemented — EPK & constants +All settings, options, and internal flags **must use a negative-term name** (`gateway20_disabled`, `gateway20_suppressed`, etc.) so that **default values mean Gateway 2.0 is enabled**. Positive-term names (`is_gateway20_allowed`, `gateway20_allowed`, `enable_gateway20`, etc.) are not permitted anywhere — driver, SDK, perf crate, env vars, or test wiring. `gateway20_disabled` is the single supported disablement mechanism; there is no `AZURE_COSMOS_*` env var that toggles Gateway 2.0. -- **`EffectivePartitionKey::compute()` / `::compute_range()`** — in `azure_data_cosmos_driver::models::effective_partition_key` (MultiHash-aware, hierarchical-PK correct). This is the canonical path and is what Gateway 2.0 header injection MUST call. Both functions return `azure_core::Result` (per PR #4087 review: MultiHash-requires-V2 and component-count checks are runtime errors, not `debug_assert`s, so a user gets `Err` rather than a panic on malformed input). -- **Constants** (in `azure_data_cosmos::constants`): `THINCLIENT_PROXY_OPERATION_TYPE` → `x-ms-thinclient-proxy-operation-type`, `THINCLIENT_PROXY_RESOURCE_TYPE` → `x-ms-thinclient-proxy-resource-type`. Phase 2 reuses these verbatim. The existing `START_EPK` (= `x-ms-start-epk`) / `END_EPK` (= `x-ms-end-epk`) constants are **not** used on Gateway 2.0 requests; Phase 2 introduces new `THINCLIENT_RANGE_MIN` (= `x-ms-thinclient-range-min`) / `THINCLIENT_RANGE_MAX` (= `x-ms-thinclient-range-max`) constants per Q3 resolution. See §Phase 2 "Header naming" for mapping. -- **Perf crate** — `gateway20_allowed` config wiring +### 3.3 EPK Range types — driver crate is canonical -### 3.4 Gating model (single source of truth) +Every Gateway 2.0 EPK-range representation lives in the **driver crate** (`azure_data_cosmos_driver`): -Two independent guards exist today (`is_gateway20_allowed` is checked in both routing and `get_dataplane_transport`). Per PR #3942 review, **routing is the single source of truth**; the transport-layer guard is intentional defense-in-depth and is technically dead code given current callers. +| Type | Role | +| --- | --- | +| `azure_data_cosmos_driver::models::range::EpkRange` | Generic typed EPK range (`min` / `max` / `is_min_inclusive` / `is_max_inclusive` + `contains` / `is_empty` / `check_overlapping` / `Display` `[a,b)` form) | +| `azure_data_cosmos_driver::models::partition_key_range::PartitionKeyRange` | Service model with `min_inclusive: EffectivePartitionKey` / `max_exclusive: EffectivePartitionKey` and full PKR metadata | +| `azure_data_cosmos_driver::models::effective_partition_key::EffectivePartitionKey` | Strongly-typed EPK newtype with `compute_range()` returning `std::ops::Range` | -Invariants this spec locks in: +EPK header injection MUST consume `EffectivePartitionKey::compute_range()` directly and serialize through the driver crate's existing types. It MUST NOT introduce a new EPK-range struct, and MUST NOT depend on any SDK-crate analog (`azure_data_cosmos::routing::range::Range`, `azure_data_cosmos::routing::partition_key_range::PartitionKeyRange`, `azure_data_cosmos::hash::EffectivePartitionKey`). The SDK has no Gateway-2.0 surface area whatsoever — the SDK calls the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs Gateway V1 internally. -- `prefer_gateway20` is computed **once per request** during `resolve_endpoint` from: - `!options.gateway20_disabled && connection_pool().is_gateway20_allowed() && account.has_thin_client_endpoints()` -- After `resolve_endpoint`, downstream stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. -- **Operator override: `CosmosClientOptions::gateway20_disabled` (default `false`)** — Customers and operators MAY set `gateway20_disabled = true` on `CosmosClientOptions` to force every request from the client to route through the standard gateway, even when the account advertises Gateway 2.0 endpoints and the operation would otherwise be eligible. +--- - ⚠️ **Setting this flag voids the latency-SLA story Gateway 2.0 is being built to deliver. It also impacts the ability to receive 24/7 Microsoft support for performance regressions on this client. Use only when explicitly directed by Microsoft Support during incident triage.** The flag is intentionally **not** exposed via environment variable to discourage casual / fleet-wide enablement; operators who need it must opt in per-client through code. +## 4. Retry Behavior - The internal `ConnectionPoolOptions::is_gateway20_allowed` flag and its env var `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` are pre-existing bring-up scaffolding, slated for removal in Phase 5 cleanup. The public `gateway20_disabled` setting is the single supported disablement mechanism going forward. +Gateway 2.0 reuses the standard retry pipeline. Two status codes have Gateway-2.0-relevant rules; one has its own dedicated policy. -### 3.5 Known broken / do-not-use +### 4.1 HTTP 449 (Retry-With) — dedicated policy, separate from 410/Gone -- **`azure_data_cosmos::hash::get_hashed_partition_key_string()`** (called from `container_connection.rs:87`) — a legacy SDK-side function that is **a known-broken stub for MultiHash (hierarchical-PK) containers**. PR #4087's description explicitly calls it out as awaiting the SDK-to-driver cutover. **Do NOT** wire Phase 2 header injection to this function; use `EffectivePartitionKey::compute()` / `::compute_range()` (§3.3). +449 typically signals a server-side precondition failure (e.g. etag conflict on a single document being updated by many writers). Aggressive retry loops on 449 amplify pathological client patterns — for example, a numeric-ID generator where dozens or hundreds of threads patch the same document. The retry policy MUST optimize for the natural case (precondition failure that resolves with a brief wait) without amplifying abuse. -### 3.6 Not Yet Implemented (Gaps) +- **Few attempts** (≤ 3) with **exponential backoff** between them (separate, looser schedule from the one used for 410/Gone). +- Retry against the **same** endpoint; do not switch regions on 449. +- **Do not** share the retry budget with 410/Gone — 449 has its own dedicated policy. +- **Gateway V1 uses the identical 449 policy.** When the server-side adds a "suppress 449" capability (under design with the server team — see Kiran for context), the client negotiates it via the `SdkSupportedCapabilities` channel and treats 449 as a non-retryable terminal error. Until that capability ships, both transports retry per the policy above. -1. **RNTBD serialization/deserialization** — No binary protocol encoding/decoding exists. Both directions live in the driver: serialization in `rntbd/request.rs`, deserialization in `rntbd/response.rs`. The SDK never handles raw RNTBD bytes — the response decode happens inside the driver and the SDK only sees typed results. See §4.2 step 6. -2. **Gateway 2.0 header injection** — Gateway 2.0 proxy headers and EPK range headers are not applied to requests on the Gateway 2.0 path -3. **Supported operation filtering** — No `IsOperationSupportedByThinClient()` equivalent -4. **`x-ms-cosmos-use-thinclient` header** on account metadata requests (to trigger thin-client endpoint advertisement) -5. **SDK-to-driver cutover for EPK** — SDK call sites (`feed_range_from_partition_key`, `container_connection.rs:87`) still call the broken SDK hash; they must route through the driver's `EffectivePartitionKey::compute()` -6. **Session token handling** — Gateway 2.0 may handle session tokens differently (partition-key-range-id prefix) -7. **Rollout/cutover policy clarification** — Document the intended enablement and cutover behavior (see Phase 4); there is intentionally **no** Gateway 2.0-specific failure-driven fallback to the standard gateway. The supported operator override is `CosmosClientOptions::gateway20_disabled` (§3.4) — a per-client opt-out with explicit SLA / support warnings. -8. **Integration/E2E tests** — No gateway 2.0 test coverage beyond the routing-systems unit tests -9. **Fault injection** — No gateway 2.0 fault injection scenarios -10. **Constants cross-crate visibility** — _Resolved_. Per PR review (analogrelay): the SDK has no Gateway-2.0 surface area whatsoever. `THINCLIENT_PROXY_*`, `THINCLIENT_RANGE_MIN/MAX`, and Gateway-2.0-specific header constants live exclusively in `azure_data_cosmos_driver::constants`; **no SDK re-export**. The SDK calls the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs standard gateway internally. The legacy `START_EPK` / `END_EPK` constants in `azure_data_cosmos::constants` remain for any non-Gateway-2.0 callers but are not used on the Gateway 2.0 path. Phase 2 deliverable includes the move. -11. **EPK Range type consolidation** — _Resolved_. Audit results across both crates: +### 4.2 HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`) - | Crate | Type | Verdict | - | --- | --- | --- | - | `azure_data_cosmos_driver::models::range::EpkRange` | Generic range with `min` / `max` / `is_min_inclusive` / `is_max_inclusive`, plus `contains` / `is_empty` / `check_overlapping` / `Display` (`[a,b)` form) | **Canonical** for typed EPK ranges | - | `azure_data_cosmos_driver::models::partition_key_range::PartitionKeyRange` | Service model with `min_inclusive: EffectivePartitionKey` / `max_exclusive: EffectivePartitionKey` and full PKR metadata (id, rid, parents, throughput, status, lsn) | **Canonical** for cached PKR entries | - | `azure_data_cosmos_driver::models::effective_partition_key::EffectivePartitionKey` | Strongly-typed EPK newtype with `compute_range()` returning `std::ops::Range` | **Canonical** EPK value type | - | `azure_data_cosmos::routing::range::Range` | SDK-side generic range | **Not used** on the Gateway 2.0 path — legacy, kept only for non-Gateway-2.0 SDK callers | - | `azure_data_cosmos::routing::partition_key_range::PartitionKeyRange` | SDK-side PKR with `min_inclusive: String` / `max_exclusive: String` (untyped) | **Not used** on the Gateway 2.0 path | - | `azure_data_cosmos::hash::EffectivePartitionKey` | SDK-side `EffectivePartitionKey(String)` newtype, distinct from the driver's | **Not used** on the Gateway 2.0 path | +Refresh the PKRange cache, then retry. **Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. - **Decision**: every Gateway 2.0 EPK-range representation lives in the **driver crate**. Phase 2's EPK header injection MUST consume `EffectivePartitionKey::compute_range()` directly and serialize through the driver crate's existing types; it MUST NOT introduce a new EPK-range struct, and MUST NOT depend on any of the SDK-crate analogs. Per item 10 there is no SDK Gateway-2.0 surface, so the SDK-crate types stay untouched and unreferenced on this path. Phase 2 PR review enforces both rules. -12. **Gateway 2.0 retry behavior for region-routed status codes** — Beyond the timeout / 408 handling already deferred to `TRANSPORT_PIPELINE_SPEC.md`, the Gateway 2.0 path inherits these region-aware retry rules from the standard pipeline (no Gateway-2.0-specific override needed): - - **HTTP 449 (Retry-With)** — Retry against the **same** Gateway 2.0 endpoint with the standard backoff schedule. **Do not** switch regions on 449. **Do not** fall back to standard gateway on 449 — the proxy is healthy; the backend asked for a retry. - - **HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`)** — Refresh the PKRange cache, then retry. **Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. +These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy operates on the resolved `(status_code, sub_status)` pair before the transport-specific deserializer ever sees the body. - These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy operates on the resolved `(status_code, sub_status)` pair before the transport-specific deserializer ever sees the body. +Beyond 449 and 404/1002, Gateway 2.0 follows the timeout/408 handling defined in `TRANSPORT_PIPELINE_SPEC.md` — no Gateway-2.0-specific override is introduced. --- -## 4. Rust Implementation Plan +## 5. Rust Implementation Plan -### 4.1 Current Request Flow (Gateway 1.0) +### 5.1 Current Request Flow (Gateway V1) 1. `ContainerClient::create_item(partition_key, item, options)` calls into `ContainerClient` -2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK (via the broken SDK hash today — see §3.5), resolves PKRange +2. `container_connection.rs` serializes `T` to `&[u8]`, computes EPK (via the SDK-side hash today, which does not handle MultiHash containers correctly), resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) 4. `resolve_endpoint()` selects a gateway endpoint 5. Transport Pipeline applies cosmos headers, signs request 6. HTTP/REST request sent to Cosmos Gateway (shared proxy, no SLA) -### 4.2 Target Request Flow (Gateway 2.0) +### 5.2 Target Request Flow (Gateway 2.0) 1. `ContainerClient::create_item(partition_key, item, options)` calls into `ContainerClient` 2. `container_connection.rs` serializes `T` to `&[u8]`; EPK computation is deferred to the driver (via `EffectivePartitionKey::compute()` / `::compute_range()`), which then resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) -4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if `prefer_gateway20` per §3.4) +4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if `prefer_gateway20` per §3.1) 5. Transport Pipeline checks `is_operation_supported_by_gateway20()`: - **YES**: Inject gateway 2.0 headers + RNTBD serialize → HTTP/2 POST to Gateway 2.0 Proxy (SLA) - **NO**: Standard HTTP/REST request to Cosmos Gateway (eligibility fallback — per-request, deterministic) @@ -289,7 +267,7 @@ This phase wires RNTBD serialization into the existing transport pipeline and ad - **Request body wrapping** — Serialize the entire request (headers + body) into RNTBD binary format and POST as the HTTP/2 body. - **Response unwrapping** — Deserialize the RNTBD response body back into `CosmosResponseHeaders` + raw document bytes. - **Eligibility fallback** — Operation ineligible for Gateway 2.0 → route through standard gateway for this single request (per-request, deterministic). See §Phase 4 for the distinct failure-driven fallback. -- **Constants placement** — Move `THINCLIENT_PROXY_*` (and any other Gateway-2.0-specific header constants) into `azure_data_cosmos_driver::constants` as part of Phase 2. **No SDK re-export** — the SDK has no Gateway-2.0 awareness; it invokes the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs standard gateway internally. See §3.6-10 (resolved). +- **Constants placement & naming** — Relocate the existing `THINCLIENT_PROXY_*` constants into `azure_data_cosmos_driver::constants` and **rename them to the `GATEWAY20_*` family** as part of Phase 2 (the wire header strings stay the same — they are server-defined and currently `x-ms-thinclient-proxy-*` — only the Rust identifier changes). Concretely: `THINCLIENT_PROXY_OPERATION_TYPE` → `GATEWAY20_OPERATION_TYPE`, `THINCLIENT_PROXY_RESOURCE_TYPE` → `GATEWAY20_RESOURCE_TYPE`, and any new EPK-range / capability constants follow the same `GATEWAY20_*` prefix. **No SDK re-export** — the SDK has no Gateway-2.0 awareness; it invokes the generic `CosmosDriver::execute_operation` interface and the driver decides Gateway 2.0 vs Gateway V1 internally. #### Supported Operations @@ -317,13 +295,15 @@ These are wire-level HTTP/2 request headers on the outer POST to the proxy. They | Header (wire) | Rust constant (crate) | Semantics | When emitted | | --- | --- | --- | --- | -| `x-ms-thinclient-proxy-operation-type` | `THINCLIENT_PROXY_OPERATION_TYPE` (driver) | Numeric operation type | Every Gateway 2.0 request | -| `x-ms-thinclient-proxy-resource-type` | `THINCLIENT_PROXY_RESOURCE_TYPE` (driver) | Numeric resource type | Every Gateway 2.0 request | +| `x-ms-thinclient-proxy-operation-type` | `GATEWAY20_OPERATION_TYPE` (driver) | Numeric operation type | Every Gateway 2.0 request | +| `x-ms-thinclient-proxy-resource-type` | `GATEWAY20_RESOURCE_TYPE` (driver) | Numeric resource type | Every Gateway 2.0 request | | `x-ms-effective-partition-key` | **NEW** — `EFFECTIVE_PARTITION_KEY` (driver) | Canonical EPK hex | Point ops only | | `x-ms-documentdb-partitionkey` | existing `PARTITION_KEY` constant (SDK) | JSON-encoded partition-key value | Point ops AND single-logical-partition query ops, alongside `x-ms-effective-partition-key` | -| `x-ms-thinclient-range-min` | **NEW** — `THINCLIENT_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | -| `x-ms-thinclient-range-max` | **NEW** — `THINCLIENT_RANGE_MAX` (driver) | Upper bound of EPK range | Feed / cross-partition ops only | -| `x-ms-cosmos-use-thinclient` | **NEW** (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | +| `x-ms-thinclient-range-min` | **NEW** — `GATEWAY20_RANGE_MIN` (driver) | Lower bound of EPK range | Feed / cross-partition ops only | +| `x-ms-thinclient-range-max` | **NEW** — `GATEWAY20_RANGE_MAX` (driver) | Upper bound of EPK range | Feed / cross-partition ops only | +| `x-ms-cosmos-use-thinclient` | **NEW** — `GATEWAY20_USE_THINCLIENT` (driver) | Instructs account-metadata response to advertise thin-client endpoints | Account metadata fetches only | + +> Wire-header strings (`x-ms-thinclient-*`) are server-defined and unchanged; the Rust-side identifiers use the `GATEWAY20_*` prefix. Per Q3 resolution, the Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` (it does **not** accept `x-ms-start-epk` / `x-ms-end-epk`). Phase 2 introduces the new constants above; the existing `START_EPK` / `END_EPK` constants are not emitted on the Gateway 2.0 path. @@ -407,7 +387,7 @@ EDIT src/driver/transport/transport_pipeline.rs — Branch on TransportMode in EDIT src/driver/transport/cosmos_headers.rs — Add gateway 2.0 header application EDIT src/driver/transport/mod.rs — Add is_operation_supported_by_gateway20() EDIT src/driver/pipeline/components.rs — Add EPK fields to TransportRequest if needed -EDIT src/driver/constants.rs (or NEW) — Relocate THINCLIENT_PROXY_* constants from azure_data_cosmos to azure_data_cosmos_driver (no SDK re-export, per §3.6-10) +EDIT src/driver/constants.rs (or NEW) — Relocate + rename THINCLIENT_PROXY_* → GATEWAY20_* constants from azure_data_cosmos to azure_data_cosmos_driver (no SDK re-export) EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashed_partition_key_string callers with driver's EffectivePartitionKey::compute() ``` @@ -417,13 +397,13 @@ EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashe **Crate**: `azure_data_cosmos_driver` -> Most of Phase 3 is **audit / verification** against scaffolding already in place (§3.1, §3.2). Only the `x-ms-cosmos-use-thinclient` request header is net-new code. Noted here because the dependency graph lists Phase 3 as a prerequisite for Phase 2; in practice the verification items can happen in parallel with Phase 1 and the one real code change can ride with Phase 2 if convenient. +> Most of Phase 3 is **audit / verification** against scaffolding already in place. Only the `x-ms-cosmos-use-thinclient` request header is net-new code. Noted here because the dependency graph lists Phase 3 as a prerequisite for Phase 2; in practice the verification items can happen in parallel with Phase 1 and the one real code change can ride with Phase 2 if convenient. #### What Will Be Done -- **Verify** account metadata cache parses `thinClientReadableLocations` / `thinClientWritableLocations` into `CosmosEndpoint::gateway20_url` (existing, per §3.2) +- **Verify** account metadata cache parses `thinClientReadableLocations` / `thinClientWritableLocations` into `CosmosEndpoint::gateway20_url` - **Confirm** `build_account_endpoint_state()` constructs `CosmosEndpoint::regional_with_gateway20()` correctly in multi-region accounts (existing tests at `routing_systems.rs:218–289` already cover this) -- **Verify** `AccountProperties::has_thin_client_endpoints()` is used as the gating signal per §3.4 +- **Verify** `AccountProperties::has_thin_client_endpoints()` is used as the gating signal per §3.1 - **Add** `x-ms-cosmos-use-thinclient` request header on account metadata fetches (new code) - **Test** endpoint discovery with live account that has gateway 2.0 enabled (handled by Phase 6 live pipeline) @@ -491,15 +471,15 @@ Gateway 2.0 is **on by default** when the account metadata advertises Gateway 2. #### What Will Be Done -- **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations (per §3.4). No user opt-in required. -- **Operator override** — `CosmosClientOptions::gateway20_disabled` (default `false`) is a public, documented setting for forcing standard-gateway routing per-client. **It carries an explicit warning that flipping it voids Gateway 2.0's latency SLA and impacts 24/7 Microsoft support eligibility for performance regressions.** Intentionally not exposed via env var. See §3.4 for the full normative wording. The legacy `ConnectionPoolOptions::is_gateway20_allowed` bring-up scaffolding is removed in Phase 5; `gateway20_disabled` is the single supported disablement mechanism. +- **Auto-detection** — When account metadata includes `thinClientReadableLocations` / `thinClientWritableLocations`, the driver automatically prefers gateway 2.0 for eligible operations (per §3.1). No user opt-in required. +- **Operator override** — `CosmosClientOptions::gateway20_disabled` (default `false`) is a public, documented setting for forcing Gateway V1 routing per-client. **It carries an explicit warning that flipping it voids Gateway 2.0's latency SLA and impacts 24/7 Microsoft support eligibility for performance regressions.** Intentionally not exposed via env var. See §3.2 for the full normative wording. There is no positive-term internal flag; `gateway20_disabled` is the single supported disablement mechanism. - **Diagnostics** — `CosmosDiagnostics` should report when a request used gateway 2.0 vs standard gateway (already partially done via `TransportKind::Gateway20`). - **User agent** — Update SDK user agent string to indicate gateway 2.0 capability. - **EPK cutover** — Replace SDK-side callers of `get_hashed_partition_key_string` with calls into the driver's `EffectivePartitionKey::compute()` / `::compute_range()` (this is the cutover PR #4087 flagged). Gateway 2.0 header injection depends on this being correct for hierarchical-PK containers. #### Auto-Detection Flow -When account metadata includes `thinClientReadableLocations`, gateway 2.0 is enabled automatically (internal). `CosmosEndpoint` gets `gateway20_url` and `resolve_endpoint()` prefers Gateway 2.0 (per §3.4's single-source-of-truth rule). No user configuration needed — transparent to the caller. +When account metadata includes `thinClientReadableLocations`, gateway 2.0 is enabled automatically (internal). `CosmosEndpoint` gets `gateway20_url` and `resolve_endpoint()` prefers Gateway 2.0 (per §3.1's single-source-of-truth rule). No user configuration needed — transparent to the caller. #### Files Changed @@ -507,7 +487,7 @@ When account metadata includes `thinClientReadableLocations`, gateway 2.0 is ena EDIT src/driver_bridge.rs — Ensure internal config passes through EDIT src/handler/container_connection.rs — Route EPK through driver's EffectivePartitionKey::compute() EDIT src/partition_key.rs — Update feed_range_from_partition_key call site -EDIT src/constants.rs — Remove THINCLIENT_PROXY_* constants (relocated to driver crate, no SDK re-export, per §3.6-10) +EDIT src/constants.rs — Remove THINCLIENT_PROXY_* constants (relocated + renamed to GATEWAY20_* in driver crate, no SDK re-export) ``` --- @@ -579,7 +559,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Bulk | | | Yes | Fan-out CRUD, distinct from Batch | | Change feed | | | Yes | LatestVersion, incremental | | Retry: 408 timeout | | Yes | | Cross-region for reads, local-only for writes | -| Retry: 449 Retry-With | | Yes | | Same Gateway 2.0 endpoint, standard backoff, no region switch, no fallback to standard gateway | +| Retry: 449 Retry-With | | Yes | | Dedicated 449 policy (≤ 3 attempts, exponential backoff, separate budget from 410/Gone), same Gateway 2.0 endpoint, no region switch, no fallback to Gateway V1 | | Retry: 503 | | Yes | | Regional failover via existing retry policies | | Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | | Retry: 404 / sub-status 1002 (PartitionKeyRangeGone) | | Yes | | PKRange cache refresh + retry against **remote-preferred** region; assert local-region retry only when no other region available; assert PLF region wins when PLF has pinned the PKRangeId | @@ -612,8 +592,8 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway --- -## 5. Open Questions +## 6. Open Questions - **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails and the existing retry policies handle it. - **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable Gateway 2.0 endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned Gateway 2.0 account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. -- **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`THINCLIENT_RANGE_MIN`, `THINCLIENT_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. +- **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`GATEWAY20_RANGE_MIN`, `GATEWAY20_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. From b45fc05846333ff010193e3b95c5dfb61bab2ec9 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 11:40:55 -0700 Subject: [PATCH 21/48] Remove personal-name references from Gateway 2.0 spec Drop @analogrelay attribution and the Kiran context pointer; replace with neutral phrasing. Also remove the names from the cspell:ignore directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index ffc00d4b09b..9b0f6e5af7c 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,4 +1,4 @@ - + # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating @@ -131,7 +131,7 @@ Gateway 2.0 reuses the standard retry pipeline. Two status codes have Gateway-2. - **Few attempts** (≤ 3) with **exponential backoff** between them (separate, looser schedule from the one used for 410/Gone). - Retry against the **same** endpoint; do not switch regions on 449. - **Do not** share the retry budget with 410/Gone — 449 has its own dedicated policy. -- **Gateway V1 uses the identical 449 policy.** When the server-side adds a "suppress 449" capability (under design with the server team — see Kiran for context), the client negotiates it via the `SdkSupportedCapabilities` channel and treats 449 as a non-retryable terminal error. Until that capability ships, both transports retry per the policy above. +- **Gateway V1 uses the identical 449 policy.** When the server-side adds a "suppress 449" capability (under design with the server team), the client negotiates it via the `SdkSupportedCapabilities` channel and treats 449 as a non-retryable terminal error. Until that capability ships, both transports retry per the policy above. ### 4.2 HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`) @@ -360,7 +360,7 @@ For Rust: thread the resolved consistency value through the pipeline as an expli #### Range header wire format -EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `EffectivePartitionKey`'s `Ord` / `cmp` impl, which correctly handles the mixed-length boundaries returned by the backend; the `epk_cmp_*` tests in `container_routing_map.rs` (around L625–665) pin this behavior. The comparator is consumed via `binary_search_by(|r| r.min_inclusive.cmp(&epk_val))` (≈L282 of the same file). `@analogrelay`'s earlier zero-padding proposal in PR #4087 (commit `25233c903`) was **not** adopted; stay consistent with the length-aware convention. +EPK range headers (`x-ms-thinclient-range-min` / `-max`) carry the canonical, un-padded hex produced by `EffectivePartitionKey::compute_range()`. **Do not** zero-pad to N×32 on the wire. Local comparisons use `EffectivePartitionKey`'s `Ord` / `cmp` impl, which correctly handles the mixed-length boundaries returned by the backend; the `epk_cmp_*` tests in `container_routing_map.rs` (around L625–665) pin this behavior. The comparator is consumed via `binary_search_by(|r| r.min_inclusive.cmp(&epk_val))` (≈L282 of the same file). An earlier zero-padding proposal in PR #4087 (commit `25233c903`) was **not** adopted; stay consistent with the length-aware convention. > **`Range` semantics pitfall** (from PR #4087): `compute_range` returns a Rust `std::ops::Range` where `start == end` denotes a **point operation**. Standard `Range` iteration treats that as empty, so code that uses `.contains()` or iterates the range directly will misbehave. Always treat `start == end` as the point case explicitly. From 81f51c63c642ef9ee1e74a7deb5c13b25803b0b8 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 27 Apr 2026 13:11:41 -0700 Subject: [PATCH 22/48] Rename prefer_gateway20 to gateway20_suppressed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per the negative-term naming rule (default values mean Gateway 2.0 enabled), rename the per-request gating flag and invert its logic. Update §3.1 formula, the Phase 1 request-flow step, and inline references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 9b0f6e5af7c..3d5395f4ce3 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -89,14 +89,14 @@ Gateway 2.0 moves **replica-level** routing intelligence from the SDK into the s Gateway 2.0 routing is decided **once per request** in the driver's `resolve_endpoint` stage. After that, downstream pipeline stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. -### 3.1 The `prefer_gateway20` formula +### 3.1 The `gateway20_suppressed` formula ```text -prefer_gateway20 = !options.gateway20_disabled - && account.has_thin_client_endpoints() +gateway20_suppressed = options.gateway20_disabled + || !account.has_thin_client_endpoints() ``` -The account-side check (`has_thin_client_endpoints()`) reads the cached account metadata. The client-side check (`gateway20_disabled`) is the only public toggle. +When `gateway20_suppressed` is `false` (the default whenever the account advertises Gateway 2.0 endpoints and the operator has not flipped the override), the request routes through Gateway 2.0. When it is `true`, the request falls through to Gateway V1. The account-side check (`has_thin_client_endpoints()`) reads the cached account metadata. The client-side check (`gateway20_disabled`) is the only public toggle. ### 3.2 Operator override: `CosmosClientOptions::gateway20_disabled` @@ -159,7 +159,7 @@ Beyond 449 and 404/1002, Gateway 2.0 follows the timeout/408 handling defined in 1. `ContainerClient::create_item(partition_key, item, options)` calls into `ContainerClient` 2. `container_connection.rs` serializes `T` to `&[u8]`; EPK computation is deferred to the driver (via `EffectivePartitionKey::compute()` / `::compute_range()`), which then resolves PKRange 3. `CosmosDriver::execute_operation()` enters the Operation Pipeline (7-stage loop) -4. `resolve_endpoint()` prefers gateway 2.0 endpoint (if `prefer_gateway20` per §3.1) +4. `resolve_endpoint()` prefers gateway 2.0 endpoint when `!gateway20_suppressed` per §3.1 5. Transport Pipeline checks `is_operation_supported_by_gateway20()`: - **YES**: Inject gateway 2.0 headers + RNTBD serialize → HTTP/2 POST to Gateway 2.0 Proxy (SLA) - **NO**: Standard HTTP/REST request to Cosmos Gateway (eligibility fallback — per-request, deterministic) From 8be703708a2bb4434a6dd7434be020383fce9de0 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 11:45:10 -0700 Subject: [PATCH 23/48] Address round 8 PR review comments on GATEWAY_20_SPEC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove firewall-rules bullet and table row (thread #4 — port-based firewall framing applies to all transports, not Gateway 2.0 specifically). - Remove HTTP/2 multiplexing bullet and tighten protocol cell (thread #5 — multiplexing is a transport feature shared with Gateway V1, not a Gateway-2.0-specific benefit). - Reword §3 routing-decision scope: 'once per request' -> 'once per logical operation, inherited by retries and sub-requests' (thread #6) — prevents mid-operation transport-mode flips that would fragment diagnostics and break session-token affinity. - §4.1 449 retry: introduce a new ThrottleAction::RetryWith { delay, new_state: RetryWithState } variant in driver/pipeline/components.rs and extend decide_throttle_action in transport_pipeline.rs (thread #7), guaranteeing structurally that the 449 budget is independent of the 410/Gone and 429 budgets. - §4.2 sub-status correction (thread #8): the parent status is 404, and the sub-status name is READ_SESSION_NOT_AVAILABLE (not PARTITION_KEY_RANGE_GONE — that name belongs to 410/1002). Body rewritten to reflect the session-token-stale semantics: existing 404 retry path applies; the only Gateway-2.0-specific deviation is that we do NOT refresh the PKRange cache on 404/1002. Test-matrix row updated. - Test coverage: add HPK + Gateway 2.0 row exercising full vs partial PK forms (full -> x-ms-effective-partition-key, partial -> x-ms-thinclient- range-min / -max via EffectivePartitionKey::compute_range()). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 30 ++++++++++++------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 3d5395f4ce3..8a2300d30b0 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -47,9 +47,7 @@ Today the Rust SDK only supports Gateway V1: a shared, stateless HTTP/REST proxy ### Key Benefits - **SLA latency guarantees** — Unlike Gateway V1, Gateway 2.0 plans to provide contractual latency commitments comparable to direct mode -- **Simplified networking** — Clients connect to a single regional proxy endpoint over HTTP/2 on a non-443 port (e.g. 1125). The endpoint IP set is dynamic and may shift across Gateway 2.0 federations; firewall rules MUST scope by hostname/port, not by specific IPs. - **Reduced operational cost** — The proxy handles replica discovery, connection management, and replica-level routing within a partition; the client stays partition-aware (resolves PKRange, computes EPK) but is **never replica-aware**, avoiding the COGS and customer-side debugging burden of maintaining one connection per backend replica. -- **HTTP/2 multiplexing** — Multiple concurrent operations share a single TCP connection. Note that Gateway V1 also supports HTTP/2 with multiplexing today; the connection-overhead win is relative to direct mode's per-replica TCP connections, not relative to Gateway V1. - **Transparent failover** — The proxy handles replica failover within a partition; the SDK handles regional failover across proxy endpoints ### Design Philosophy @@ -76,18 +74,17 @@ Gateway 2.0 moves **replica-level** routing intelligence from the SDK into the s | --- | --- | --- | --- | | Latency SLA | No | **Yes** | Yes | | Simple Network | Yes | Yes | No | -| Protocol | REST/HTTP, HTTP/2 + multiplexing supported | RNTBD message encoding over HTTP/2 | RNTBD over TCP | +| Protocol | REST/HTTP over HTTP/2 | RNTBD message encoding over HTTP/2 | RNTBD over TCP | | Replica Mgmt | Gateway/Proxy | Proxy | SDK | | Partition Route | Gateway/Proxy | Proxy | SDK | | Regional Route | SDK | SDK | SDK | | Operational Cost (COGS + debug) | Low | Low | High | -| Firewall Rules | 1 endpoint (443) | 1 endpoint (non-443, e.g. 1125; dynamic IPs) | N replicas | --- ## 3. Gating, Configuration & Override -Gateway 2.0 routing is decided **once per request** in the driver's `resolve_endpoint` stage. After that, downstream pipeline stages MUST trust `RoutingDecision.transport_mode` and not re-derive eligibility. +Gateway 2.0 routing is decided **once per logical operation** (a point operation, a single query-iteration page, a single batch, etc.) in the driver's `resolve_endpoint` stage. The resulting `RoutingDecision.transport_mode` is then attached to the operation context and **inherited by all retries and sub-requests** of that operation — downstream pipeline stages, retry policies, and hedged sub-requests MUST trust the attached decision and not re-derive eligibility. (Re-evaluating `gateway20_suppressed` mid-operation could route a retry through a different transport than the original attempt, fragmenting diagnostics and breaking session-token affinity.) ### 3.1 The `gateway20_suppressed` formula @@ -128,14 +125,26 @@ Gateway 2.0 reuses the standard retry pipeline. Two status codes have Gateway-2. 449 typically signals a server-side precondition failure (e.g. etag conflict on a single document being updated by many writers). Aggressive retry loops on 449 amplify pathological client patterns — for example, a numeric-ID generator where dozens or hundreds of threads patch the same document. The retry policy MUST optimize for the natural case (precondition failure that resolves with a brief wait) without amplifying abuse. -- **Few attempts** (≤ 3) with **exponential backoff** between them (separate, looser schedule from the one used for 410/Gone). +The 449 retry rules are expressed as a **new `ThrottleAction` variant** that this spec +introduces. The existing `ThrottleAction` enum in `driver/pipeline/components.rs` (today: +`Retry { delay, new_state: ThrottleRetryState } | Propagate`) gains a third variant — +`RetryWith { delay, new_state: RetryWithState }` — and `decide_throttle_action` in +`driver/transport/transport_pipeline.rs` is extended to emit that variant whenever the +resolved `(status_code, sub_status)` pair is `(449, *)`. The new `RetryWithState` struct +owns the 449-specific retry budget and is **distinct from** `ThrottleRetryState`, which +guarantees structurally that 449 retries cannot consume the 410/Gone or 429 budget (and +vice versa). + +- **Few attempts** (≤ 3) with **exponential backoff** between them. - Retry against the **same** endpoint; do not switch regions on 449. -- **Do not** share the retry budget with 410/Gone — 449 has its own dedicated policy. +- **Independent retry budget** — `RetryWithState` does not share counters with + `ThrottleRetryState` or the 410/Gone retry path. A 449 followed by a 410 on the same + logical operation gets fresh budget on the 410 side, and vice versa. - **Gateway V1 uses the identical 449 policy.** When the server-side adds a "suppress 449" capability (under design with the server team), the client negotiates it via the `SdkSupportedCapabilities` channel and treats 449 as a non-retryable terminal error. Until that capability ships, both transports retry per the policy above. -### 4.2 HTTP 404 with sub-status `1002` (`PARTITION_KEY_RANGE_GONE`) +### 4.2 HTTP 404 (Not Found) with sub-status `1002` (`READ_SESSION_NOT_AVAILABLE`) -Refresh the PKRange cache, then retry. **Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. +Follow the existing 404 / `READ_SESSION_NOT_AVAILABLE` retry path defined in `TRANSPORT_PIPELINE_SPEC.md` (re-read the session token from another replica, retry per the standard session-retry budget). The **only** Gateway-2.0-specific deviation: do **not** refresh the PKRange cache on a 404/1002 — the partition routing is not the suspected cause of the stale-session condition, so refreshing the routing map would be wasted churn. (Note that `1002` is reused with different meaning under parent status `410` (`PARTITION_KEY_RANGE_GONE`), which is handled by the existing 410/Gone retry path and is unchanged on Gateway 2.0.) These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy operates on the resolved `(status_code, sub_status)` pair before the transport-specific deserializer ever sees the body. @@ -543,6 +552,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | EPK computation | Yes | | | Single/hierarchical PK, hash versions 1 and 2, error cases (MultiHash V1, wrong component count) | | Operation filtering | Yes | | | All ResourceType × OperationType combos; asserts StoredProc Execute is rejected | | Header injection | Yes | | | Point vs feed EPK headers, proxy type headers, range-header un-padded form | +| HPK + Gateway 2.0: full vs partial PK | Yes | | Yes | Hierarchical container (2- and 3-component PK paths). **Full PK** (all components specified) on a point op → emits `x-ms-effective-partition-key` carrying the single EPK from `EffectivePartitionKey::compute()`. **Partial PK** (1- or 2-component prefix) on a feed / cross-partition / delete-by-PK op → emits `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max` carrying the EPK range from `EffectivePartitionKey::compute_range()`. Asserted at unit level (header presence + exact wire form, range bounds for each prefix length) and E2E (round-trip against a live HPK container). | | Account-name RNTBD token | Yes | | | `GlobalDatabaseAccountName` (`0x00CE`, `String`) present in the RNTBD metadata stream of every Gateway 2.0 request (point, feed, batch, bulk, change feed). Value matches the host label of the account endpoint URL. | | SDK-supported-capabilities header | Yes | | | `x-ms-cosmos-sdk-supportedcapabilities` value emitted is the bitmask string for `(PartitionMerge \| IgnoreUnknownRntbdTokens)`, **not** `"0"`. Pin against the integer value sourced from .NET `SDKSupportedCapabilities.cs`. | | Consistency reconciliation: token + header encoding | Yes | | | RNTBD token `0x00F0` Byte round-trip for all 4 strategies; HTTP header `x-ms-cosmos-read-consistency-strategy` exact wire-string mapping for all 4 strategies; `Default` emits neither carrier on either transport. | @@ -562,7 +572,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Retry: 449 Retry-With | | Yes | | Dedicated 449 policy (≤ 3 attempts, exponential backoff, separate budget from 410/Gone), same Gateway 2.0 endpoint, no region switch, no fallback to Gateway V1 | | Retry: 503 | | Yes | | Regional failover via existing retry policies | | Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | -| Retry: 404 / sub-status 1002 (PartitionKeyRangeGone) | | Yes | | PKRange cache refresh + retry against **remote-preferred** region; assert local-region retry only when no other region available; assert PLF region wins when PLF has pinned the PKRangeId | +| Retry: 404 / sub-status 1002 (ReadSessionNotAvailable) | | Yes | | Existing 404 / `READ_SESSION_NOT_AVAILABLE` retry path runs (re-read session token from another replica); assert that **no PKRange cache refresh** is triggered | | Operator override (`gateway20_disabled = true`) | Yes | Yes | | All eligible Document ops (point + feed + batch + change feed) route through standard gateway; default `false` does not change behavior | | Eligibility fallback | | Yes | | StoredProc Execute → standard gateway | | PLF precedence | | Yes | | Region without gw20_url + PLF override → standard gateway path | From 9c7484ef3b28cc97e94824272a6377ee77dd397c Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 12:20:21 -0700 Subject: [PATCH 24/48] Restore prefer-remote-region wording for 404/1002 retry MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the 'follow existing 404 path' body in §4.2 with the prior 'prefer remote region + PLF precedence' wording. Update the matching test-matrix row to assert prefer-remote routing, PLF override, and the no-PKRange-refresh invariant on 404/1002. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 8a2300d30b0..dc09653487d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -144,7 +144,7 @@ vice versa). ### 4.2 HTTP 404 (Not Found) with sub-status `1002` (`READ_SESSION_NOT_AVAILABLE`) -Follow the existing 404 / `READ_SESSION_NOT_AVAILABLE` retry path defined in `TRANSPORT_PIPELINE_SPEC.md` (re-read the session token from another replica, retry per the standard session-retry budget). The **only** Gateway-2.0-specific deviation: do **not** refresh the PKRange cache on a 404/1002 — the partition routing is not the suspected cause of the stale-session condition, so refreshing the routing map would be wasted churn. (Note that `1002` is reused with different meaning under parent status `410` (`PARTITION_KEY_RANGE_GONE`), which is handled by the existing 410/Gone retry path and is unchanged on Gateway 2.0.) +**Always prefer a remote region for the retry** when one is available in the client's preferred-region list — the local region is suspected of carrying the stale routing, so pinning the retry to the same Gateway 2.0 endpoint that just returned 1002 reproduces the bug. **PLF takes precedence**: if PLF (per `PARTITION_LEVEL_FAILOVER_SPEC.md`) has already pinned a region for this PKRangeId, the PLF region wins over the "prefer remote" hint. These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy operates on the resolved `(status_code, sub_status)` pair before the transport-specific deserializer ever sees the body. @@ -572,7 +572,7 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Retry: 449 Retry-With | | Yes | | Dedicated 449 policy (≤ 3 attempts, exponential backoff, separate budget from 410/Gone), same Gateway 2.0 endpoint, no region switch, no fallback to Gateway V1 | | Retry: 503 | | Yes | | Regional failover via existing retry policies | | Retry: 410 Gone | | Yes | | PKRange refresh (sub-status specific); NameCacheStale → collection cache | -| Retry: 404 / sub-status 1002 (ReadSessionNotAvailable) | | Yes | | Existing 404 / `READ_SESSION_NOT_AVAILABLE` retry path runs (re-read session token from another replica); assert that **no PKRange cache refresh** is triggered | +| Retry: 404 / sub-status 1002 (ReadSessionNotAvailable) | | Yes | | Retry routes to a **remote-preferred** region (assert local-region retry only when no other region is available); assert PLF region wins when PLF has pinned the PKRangeId; assert that **no PKRange cache refresh** is triggered | | Operator override (`gateway20_disabled = true`) | Yes | Yes | | All eligible Document ops (point + feed + batch + change feed) route through standard gateway; default `false` does not change behavior | | Eligibility fallback | | Yes | | StoredProc Execute → standard gateway | | PLF precedence | | Yes | | Region without gw20_url + PLF override → standard gateway path | From 8363209848e737ed1d791599ef55e70abebaf0a5 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 18:13:40 -0700 Subject: [PATCH 25/48] Add Gateway 2.0 RNTBD wire format (Slice 1) Introduces the rntbd module: request frame serializer, response frame deserializer, token codecs (Byte/UShort/ULong/Long/ULongLong/LongLong/Guid/ SmallString/String/ULongString/SmallBytes/Bytes/ULongBytes/Float/Double), and HTTP-status mapping with optional sub-status enrichment. Activity-Id is encoded as the 16-byte [u64 LE msb][u64 LE lsb] pair on the wire (matching Java/.NET RNTBD), distinct from the metadata Guid token encoding (MS GUID fields LE via Uuid::as_fields). The capabilities header advertises bitmask "9" (PartitionMerge | IgnoreUnknownRntbdTokens). Unlike Java which advertises "11" (also ChangeFeedWithStartTimeFromBeginning), Rust intentionally skips that bit in Slice 1 because the underlying behavior is not yet implemented \u2014 advertising an unimplemented capability would violate the contract. Per AGENTS.md ("Prefer Result::Err over panicking"), serialize() and the underlying token write_to / write_len_prefixed_* helpers all return azure_core::Result so oversized inputs (frame > u32, SmallString > 255, String > 64 KB) surface as data-conversion errors instead of panicking. The module is gated by #[allow(dead_code, unused_imports)] and is not yet wired into the transport pipeline; Slice 2 will add the dispatcher and operation-eligibility filter. Implements R1, R2, R3 and AC1-AC4 from .coding-harness/spec.json, mapping to GATEWAY_20_SPEC.md \u00a75 Phase 1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/driver/transport/cosmos_headers.rs | 44 +- .../src/driver/transport/mod.rs | 1 + .../src/driver/transport/rntbd/mod.rs | 21 + .../src/driver/transport/rntbd/request.rs | 224 ++++++ .../src/driver/transport/rntbd/response.rs | 266 ++++++++ .../src/driver/transport/rntbd/status.rs | 46 ++ .../src/driver/transport/rntbd/tokens.rs | 637 ++++++++++++++++++ 7 files changed, 1238 insertions(+), 1 deletion(-) create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/mod.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/request.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/response.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/status.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs index 734b175a2ac..6f9eeab5d2d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs @@ -13,7 +13,15 @@ const APPLICATION_JSON: HeaderValue = HeaderValue::from_static("application/json const VERSION: HeaderName = HeaderName::from_static("x-ms-version"); const SDK_SUPPORTED_CAPABILITIES: HeaderName = HeaderName::from_static("x-ms-cosmos-sdk-supportedcapabilities"); -const SUPPORTED_CAPABILITIES_VALUE: &str = "0"; +const PARTITION_MERGE_BIT: u32 = 1; +const IGNORE_UNKNOWN_RNTBD_TOKENS_BIT: u32 = 8; +const SUPPORTED_CAPABILITIES_BITS: u32 = PARTITION_MERGE_BIT | IGNORE_UNKNOWN_RNTBD_TOKENS_BIT; +const _: () = assert!(SUPPORTED_CAPABILITIES_BITS == 9); +/// String-encoded SDK capabilities bitmask. +/// +/// Derived from `PartitionMerge` (1) | `IgnoreUnknownRntbdTokens` (8), which +/// advertises Gateway 2.0 forward compatibility with unknown RNTBD tokens. +const SUPPORTED_CAPABILITIES_VALUE: &str = "9"; const CACHE_CONTROL: HeaderName = HeaderName::from_static("cache-control"); const NO_CACHE: HeaderValue = HeaderValue::from_static("no-cache"); @@ -40,3 +48,37 @@ pub(crate) fn apply_cosmos_headers(request: &mut HttpRequest, user_agent: &Heade request.headers.insert(CACHE_CONTROL, NO_CACHE.clone()); request.headers.insert(USER_AGENT, user_agent.clone()); } + +#[cfg(test)] +mod tests { + use super::*; + use azure_core::http::{headers::Headers, Method}; + use url::Url; + + #[test] + fn applies_supported_capabilities_bitmask() { + let mut request = HttpRequest { + url: Url::parse("https://example.documents.azure.com/").unwrap(), + method: Method::Get, + headers: Headers::new(), + body: None, + timeout: None, + #[cfg(feature = "fault_injection")] + evaluation_collector: None, + }; + let user_agent = HeaderValue::from_static("test-agent"); + + apply_cosmos_headers(&mut request, &user_agent); + + assert_eq!( + SUPPORTED_CAPABILITIES_VALUE.parse::().unwrap(), + PARTITION_MERGE_BIT | IGNORE_UNKNOWN_RNTBD_TOKENS_BIT + ); + assert_eq!( + request + .headers + .get_optional_str(&SDK_SUPPORTED_CAPABILITIES), + Some(SUPPORTED_CAPABILITIES_VALUE) + ); + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs index 790ae170707..7c34dc53576 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs @@ -24,6 +24,7 @@ pub(crate) mod http_client_factory; pub(crate) mod request_signing; #[cfg(feature = "reqwest")] pub(crate) mod reqwest_transport_client; +pub(crate) mod rntbd; mod sharded_transport; pub(crate) use sharded_transport::EndpointKey; mod tracked_transport; diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/mod.rs new file mode 100644 index 00000000000..b0c2571448c --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/mod.rs @@ -0,0 +1,21 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! Gateway 2.0 RNTBD wire-format support. +//! +//! This module owns in-memory request serialization and response deserialization +//! for RNTBD frames carried by the Gateway 2.0 transport path. + +// Slice 1 intentionally lands the wire-format module before later slices wire it +// into the transport pipeline. +#![allow(dead_code, unused_imports)] + +pub(crate) mod request; +pub(crate) mod response; +pub(crate) mod status; +pub(crate) mod tokens; + +pub(crate) use request::RntbdRequestFrame; +pub(crate) use response::RntbdResponse; +pub(crate) use status::map_rntbd_status_to_cosmos_status; +pub(crate) use tokens::{Token, TokenType, TokenValue}; diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/request.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/request.rs new file mode 100644 index 00000000000..dffb829ced6 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/request.rs @@ -0,0 +1,224 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! RNTBD request frame serialization. + +use uuid::Uuid; + +use crate::models::{OperationType, ResourceType}; + +use super::tokens::{ + data_conversion_error, write_uuid_le, RntbdOperationType, RntbdResourceType, Token, +}; + +/// A Gateway 2.0 RNTBD request frame. +/// +/// The body is schema-agnostic raw bytes. When [`body`](Self::body) is present, +/// serialization emits the payload length followed by the payload bytes. +#[derive(Clone, Debug, PartialEq)] +pub(crate) struct RntbdRequestFrame { + /// Resource type encoded into the frame header. + pub(crate) resource_type: ResourceType, + /// Operation type encoded into the frame header. + pub(crate) operation_type: OperationType, + /// Activity identifier encoded as two little-endian `u64` values. + pub(crate) activity_id: Uuid, + /// Metadata token stream. + pub(crate) metadata: Vec, + /// Optional raw request payload. + pub(crate) body: Option>, +} + +impl RntbdRequestFrame { + /// Serializes the request frame to Gateway 2.0 RNTBD bytes. + /// + /// The total length field is inclusive of its own four bytes. Returns + /// [`ErrorKind::DataConversion`] when an input exceeds an RNTBD wire + /// length limit (e.g., a metadata token value longer than the + /// `SmallString` length prefix supports, a body larger than `u32::MAX`, + /// or a frame whose total length exceeds `u32::MAX`). + /// + /// [`ErrorKind::DataConversion`]: azure_core::error::ErrorKind::DataConversion + pub(crate) fn serialize(&self) -> azure_core::Result> { + let metadata_len: usize = self.metadata.iter().map(Token::encoded_len).sum(); + let body_len = self.body.as_ref().map_or(0, |body| 4 + body.len()); + let total_len = 24 + metadata_len + body_len; + let total_len_u32 = u32::try_from(total_len).map_err(|_| { + data_conversion_error(format!( + "RNTBD request frame length {total_len} exceeds u32::MAX" + )) + })?; + + let mut out = Vec::with_capacity(total_len); + out.extend_from_slice(&total_len_u32.to_le_bytes()); + out.extend_from_slice( + &RntbdResourceType::from(self.resource_type) + .value() + .to_le_bytes(), + ); + out.extend_from_slice( + &RntbdOperationType::from(self.operation_type) + .value() + .to_le_bytes(), + ); + write_uuid_le(&mut out, self.activity_id); + + for token in &self.metadata { + token.write_to(&mut out)?; + } + + if let Some(body) = &self.body { + let body_len = u32::try_from(body.len()).map_err(|_| { + data_conversion_error(format!( + "RNTBD request payload length {} exceeds u32::MAX", + body.len() + )) + })?; + out.extend_from_slice(&body_len.to_le_bytes()); + out.extend_from_slice(body); + } + + debug_assert_eq!(out.len(), total_len); + Ok(out) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::driver::transport::rntbd::tokens::{ + data_conversion_error, read_u16_le, read_u32_le, read_uuid_le, RntbdOperationType, + RntbdResourceType, TokenValue, + }; + + #[test] + fn request_frames_round_trip_for_slice_one_operations() { + let operations = [ + OperationType::Create, + OperationType::Read, + OperationType::ReadFeed, + OperationType::Replace, + OperationType::Delete, + OperationType::Upsert, + OperationType::Query, + OperationType::SqlQuery, + OperationType::Head, + OperationType::HeadFeed, + OperationType::Batch, + ]; + + for operation_type in operations { + for body in [None, Some(vec![0x7b, 0x7d])] { + let frame = RntbdRequestFrame { + resource_type: ResourceType::Document, + operation_type, + activity_id: Uuid::from_u128(0x1234_5678_90ab_cdef_0123_4567_89ab_cdef), + metadata: Vec::new(), + body, + }; + + let bytes = frame.serialize().unwrap(); + let parsed = parse_request_for_tests(&bytes, frame.body.is_some()).unwrap(); + + assert_eq!(parsed, frame); + } + } + } + + #[test] + fn query_plan_uses_sql_query_wire_id_until_metadata_rules_land() { + let frame = RntbdRequestFrame { + resource_type: ResourceType::Document, + operation_type: OperationType::QueryPlan, + activity_id: Uuid::nil(), + metadata: Vec::new(), + body: None, + }; + + let bytes = frame.serialize().unwrap(); + let operation_id = u16::from_le_bytes([bytes[6], bytes[7]]); + + // QueryPlan has no distinct Java RNTBD operation ID. Slice 2 will add + // the metadata that disambiguates query-plan requests from SqlQuery. + assert_eq!( + operation_id, + RntbdOperationType::from(OperationType::SqlQuery).value() + ); + } + + #[test] + fn metadata_tokens_are_serialized_between_header_and_body() { + let frame = RntbdRequestFrame { + resource_type: ResourceType::Document, + operation_type: OperationType::Read, + activity_id: Uuid::nil(), + metadata: vec![Token::new(0x00CE, TokenValue::String("account".to_owned()))], + body: None, + }; + + let bytes = frame.serialize().unwrap(); + let parsed = parse_request_for_tests(&bytes, false).unwrap(); + + assert_eq!(parsed, frame); + } + + #[test] + fn serialize_returns_error_when_small_string_exceeds_u8_length_prefix() { + let oversized = "a".repeat(256); + let frame = RntbdRequestFrame { + resource_type: ResourceType::Document, + operation_type: OperationType::Read, + activity_id: Uuid::nil(), + metadata: vec![Token::new(0x0001, TokenValue::SmallString(oversized))], + body: None, + }; + + let err = frame.serialize().unwrap_err(); + assert_eq!(*err.kind(), azure_core::error::ErrorKind::DataConversion); + } + + fn parse_request_for_tests( + bytes: &[u8], + has_body: bool, + ) -> azure_core::Result { + let mut src = bytes; + let total_len = read_u32_le(&mut src)? as usize; + if total_len != bytes.len() { + return Err(data_conversion_error(format!( + "request frame length {total_len} did not match buffer length {}", + bytes.len() + ))); + } + + let resource_type = + ResourceType::try_from(RntbdResourceType::try_from(read_u16_le(&mut src)?)?)?; + let operation_type = + OperationType::try_from(RntbdOperationType::try_from(read_u16_le(&mut src)?)?)?; + let activity_id = read_uuid_le(&mut src)?; + + let mut metadata = Vec::new(); + let body = if has_body { + let payload_len = read_u32_le(&mut src)? as usize; + if src.len() != payload_len { + return Err(data_conversion_error(format!( + "request payload length {payload_len} did not match remaining bytes {}", + src.len() + ))); + } + Some(src.to_vec()) + } else { + while !src.is_empty() { + metadata.push(Token::read_from(&mut src)?); + } + None + }; + + Ok(RntbdRequestFrame { + resource_type, + operation_type, + activity_id, + metadata, + body, + }) + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/response.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/response.rs new file mode 100644 index 00000000000..db2cacf3906 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/response.rs @@ -0,0 +1,266 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! RNTBD response frame deserialization. + +use uuid::Uuid; + +use crate::models::CosmosStatus; + +use super::{ + status::map_rntbd_status_to_cosmos_status, + tokens::{ + data_conversion_error, read_u32_le, read_uuid_le, RntbdResponseToken, Token, TokenValue, + }, +}; + +/// A decoded Gateway 2.0 RNTBD response frame. +/// +/// The body is schema-agnostic raw bytes. Recognized metadata tokens are surfaced +/// as typed optional fields; unknown token IDs are silently consumed. +#[derive(Clone, Debug, PartialEq)] +pub(crate) struct RntbdResponse { + /// Status composed from the frame HTTP status and optional SubStatus token. + pub(crate) status: CosmosStatus, + /// Activity identifier echoed by the service. + pub(crate) activity_id: Uuid, + /// Raw response payload bytes. + pub(crate) body: Vec, + /// Continuation token for feed-style operations. + pub(crate) continuation_token: Option, + /// Entity tag returned by the service. + pub(crate) etag: Option, + /// Retry-after delay in milliseconds. + pub(crate) retry_after_ms: Option, + /// Logical sequence number. + pub(crate) lsn: Option, + /// Request charge in request units. + pub(crate) request_charge: Option, + /// Owner full name metadata. + pub(crate) owner_full_name: Option, + /// Partition key range identifier. + pub(crate) partition_key_range_id: Option, + /// Item logical sequence number. + pub(crate) item_lsn: Option, + /// Global committed logical sequence number. + pub(crate) global_committed_lsn: Option, + /// Transport request identifier. + pub(crate) transport_request_id: Option, + /// Session token for session consistency. + pub(crate) session_token: Option, +} + +impl RntbdResponse { + /// Deserializes a Gateway 2.0 RNTBD response frame. + /// + /// Unknown metadata token IDs are silently consumed when their token type is + /// known. Malformed token values and unknown token type bytes return errors. + /// The Slice 1 frame shape has no separate metadata length, so the parser + /// advances the tracking offset by decoding complete metadata tokens and + /// preserves any trailing bytes shorter than a token header as body bytes. + pub(crate) fn deserialize(bytes: &[u8]) -> azure_core::Result { + let mut src = bytes; + let total_len = read_u32_le(&mut src)? as usize; + if total_len > bytes.len() { + return Err(data_conversion_error(format!( + "RNTBD response length {total_len} exceeds buffer length {}", + bytes.len() + ))); + } + if total_len < 24 { + return Err(data_conversion_error(format!( + "RNTBD response length {total_len} is smaller than the 24-byte header" + ))); + } + + let mut frame = &bytes[4..total_len]; + let http_status = read_u32_le(&mut frame)?; + let activity_id = read_uuid_le(&mut frame)?; + + let mut continuation_token = None; + let mut etag = None; + let mut retry_after_ms = None; + let mut lsn = None; + let mut request_charge = None; + let mut owner_full_name = None; + let mut sub_status = None; + let mut partition_key_range_id = None; + let mut item_lsn = None; + let mut global_committed_lsn = None; + let mut transport_request_id = None; + let mut session_token = None; + + while frame.len() >= 3 { + let token = Token::read_from(&mut frame)?; + match RntbdResponseToken::try_from(token.id) { + Ok(RntbdResponseToken::ContinuationToken) => { + continuation_token = Some(expect_string(token, "ContinuationToken")?); + } + Ok(RntbdResponseToken::ETag) => { + etag = Some(expect_string(token, "ETag")?); + } + Ok(RntbdResponseToken::RetryAfterMilliseconds) => { + retry_after_ms = Some(expect_u32(token, "RetryAfterMilliseconds")?); + } + Ok(RntbdResponseToken::Lsn) => { + lsn = Some(expect_i64(token, "LSN")?); + } + Ok(RntbdResponseToken::RequestCharge) => { + request_charge = Some(expect_f64(token, "RequestCharge")?); + } + Ok(RntbdResponseToken::OwnerFullName) => { + owner_full_name = Some(expect_string(token, "OwnerFullName")?); + } + Ok(RntbdResponseToken::SubStatus) => { + sub_status = Some(expect_u32(token, "SubStatus")?); + } + Ok(RntbdResponseToken::PartitionKeyRangeId) => { + partition_key_range_id = Some(expect_string(token, "PartitionKeyRangeId")?); + } + Ok(RntbdResponseToken::ItemLsn) => { + item_lsn = Some(expect_i64(token, "ItemLSN")?); + } + Ok(RntbdResponseToken::GlobalCommittedLsn) => { + global_committed_lsn = Some(expect_i64(token, "GlobalCommittedLSN")?); + } + Ok(RntbdResponseToken::TransportRequestId) => { + transport_request_id = Some(expect_u32(token, "TransportRequestID")?); + } + Ok(RntbdResponseToken::SessionToken) => { + session_token = Some(expect_string(token, "SessionToken")?); + } + Err(()) => {} + } + } + + Ok(Self { + status: map_rntbd_status_to_cosmos_status(http_status, sub_status), + activity_id, + body: frame.to_vec(), + continuation_token, + etag, + retry_after_ms, + lsn, + request_charge, + owner_full_name, + partition_key_range_id, + item_lsn, + global_committed_lsn, + transport_request_id, + session_token, + }) + } +} + +fn expect_string(token: Token, name: &str) -> azure_core::Result { + match token.value { + TokenValue::String(value) => Ok(value), + _ => Err(unexpected_token_type(name)), + } +} + +fn expect_u32(token: Token, name: &str) -> azure_core::Result { + match token.value { + TokenValue::ULong(value) => Ok(value), + _ => Err(unexpected_token_type(name)), + } +} + +fn expect_i64(token: Token, name: &str) -> azure_core::Result { + match token.value { + TokenValue::LongLong(value) => Ok(value), + _ => Err(unexpected_token_type(name)), + } +} + +fn expect_f64(token: Token, name: &str) -> azure_core::Result { + match token.value { + TokenValue::Double(value) => Ok(value), + _ => Err(unexpected_token_type(name)), + } +} + +fn unexpected_token_type(name: &str) -> azure_core::Error { + data_conversion_error(format!("RNTBD token {name} had an unexpected value type")) +} + +#[cfg(test)] +mod tests { + use super::*; + use azure_core::http::StatusCode; + + use crate::driver::transport::rntbd::tokens::write_uuid_le; + + #[test] + fn unknown_token_id_is_silently_skipped() { + let mut frame = response_header(StatusCode::Ok); + Token::new(0x0015, TokenValue::Double(1.5)) + .write_to(&mut frame) + .unwrap(); + Token::new(0xFFFE, TokenValue::SmallString("hello".to_owned())) + .write_to(&mut frame) + .unwrap(); + Token::new(0x001C, TokenValue::ULong(1002)) + .write_to(&mut frame) + .unwrap(); + patch_total_len(&mut frame); + + let response = RntbdResponse::deserialize(&frame).unwrap(); + + assert_eq!(response.status.status_code(), StatusCode::Ok); + assert_eq!(response.status.sub_status().unwrap().value(), 1002); + assert_eq!(response.request_charge, Some(1.5)); + assert!(response.body.is_empty()); + } + + #[test] + fn total_length_past_buffer_is_rejected() { + let mut frame = response_header(StatusCode::Ok); + let total_len = (frame.len() as u32) + 1; + frame[0..4].copy_from_slice(&total_len.to_le_bytes()); + + let err = RntbdResponse::deserialize(&frame).unwrap_err(); + + assert_eq!(*err.kind(), azure_core::error::ErrorKind::DataConversion); + } + + #[test] + fn trailing_bytes_shorter_than_token_header_are_body() { + let mut frame = response_header(StatusCode::Ok); + frame.extend_from_slice(&[0xAA, 0xBB]); + patch_total_len(&mut frame); + + let response = RntbdResponse::deserialize(&frame).unwrap(); + + assert_eq!(response.body, vec![0xAA, 0xBB]); + } + + #[test] + fn metadata_before_short_body_is_preserved() { + let mut frame = response_header(StatusCode::Ok); + Token::new(0x0015, TokenValue::Double(2.5)) + .write_to(&mut frame) + .unwrap(); + frame.extend_from_slice(&[0xAA, 0xBB]); + patch_total_len(&mut frame); + + let response = RntbdResponse::deserialize(&frame).unwrap(); + + assert_eq!(response.request_charge, Some(2.5)); + assert_eq!(response.body, vec![0xAA, 0xBB]); + } + + fn response_header(status_code: StatusCode) -> Vec { + let mut frame = Vec::new(); + frame.extend_from_slice(&0_u32.to_le_bytes()); + frame.extend_from_slice(&u16::from(status_code).to_le_bytes()); + frame.extend_from_slice(&0_u16.to_le_bytes()); + write_uuid_le(&mut frame, Uuid::nil()); + frame + } + + fn patch_total_len(frame: &mut [u8]) { + let total_len = u32::try_from(frame.len()).unwrap(); + frame[0..4].copy_from_slice(&total_len.to_le_bytes()); + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/status.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/status.rs new file mode 100644 index 00000000000..044d43d4483 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/status.rs @@ -0,0 +1,46 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! RNTBD status mapping helpers. + +use azure_core::http::StatusCode; + +use crate::models::CosmosStatus; + +/// Maps RNTBD frame status fields into a [`CosmosStatus`]. +/// +/// RNTBD carries the HTTP status in the frame header and the Cosmos DB +/// sub-status as an optional metadata token. +pub(crate) fn map_rntbd_status_to_cosmos_status( + http_status: u32, + sub_status: Option, +) -> CosmosStatus { + let status = StatusCode::from(http_status as u16); + let mut cosmos_status = CosmosStatus::new(status); + if let Some(sub_status) = sub_status { + cosmos_status = cosmos_status.with_sub_status(sub_status); + } + cosmos_status +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn maps_http_status_and_sub_status() { + let status = map_rntbd_status_to_cosmos_status(404, Some(1002)); + + assert_eq!(status.status_code(), StatusCode::NotFound); + assert_eq!(status.sub_status().unwrap().value(), 1002); + assert_eq!(status.name(), Some("ReadSessionNotAvailable")); + } + + #[test] + fn unknown_http_status_is_preserved() { + let status = map_rntbd_status_to_cosmos_status(449, None); + + assert_eq!(status.status_code(), StatusCode::UnknownValue(449)); + assert_eq!(status.sub_status(), None); + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs new file mode 100644 index 00000000000..62a3303baee --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs @@ -0,0 +1,637 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! RNTBD metadata token codecs and wire ID mappings. + +use azure_core::error::ErrorKind; +use uuid::Uuid; + +use crate::models::{OperationType, ResourceType}; + +/// The token type byte used by RNTBD metadata tokens. +/// +/// Variable-width types carry their own length prefix in the value payload. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(crate) enum TokenType { + /// Single unsigned byte. + Byte, + /// Unsigned 16-bit integer encoded little-endian. + UShort, + /// Unsigned 32-bit integer encoded little-endian. + ULong, + /// Signed 32-bit integer encoded little-endian. + Long, + /// Unsigned 64-bit integer encoded little-endian. + ULongLong, + /// Signed 64-bit integer encoded little-endian. + LongLong, + /// UUID encoded in Microsoft GUID byte order. + Guid, + /// UTF-8 string prefixed with an unsigned byte length. + SmallString, + /// UTF-8 string prefixed with an unsigned 16-bit length. + String, + /// UTF-8 string prefixed with an unsigned 32-bit length. + ULongString, + /// Bytes prefixed with an unsigned byte length. + SmallBytes, + /// Bytes prefixed with an unsigned 16-bit length. + Bytes, + /// Bytes prefixed with an unsigned 32-bit length. + ULongBytes, + /// 32-bit floating point value encoded little-endian. + Float, + /// 64-bit floating point value encoded little-endian. + Double, + /// Invalid token type sentinel. + Invalid, +} + +impl TryFrom for TokenType { + type Error = azure_core::Error; + + fn try_from(value: u8) -> azure_core::Result { + match value { + 0x00 => Ok(Self::Byte), + 0x01 => Ok(Self::UShort), + 0x02 => Ok(Self::ULong), + 0x03 => Ok(Self::Long), + 0x04 => Ok(Self::ULongLong), + 0x05 => Ok(Self::LongLong), + 0x06 => Ok(Self::Guid), + 0x07 => Ok(Self::SmallString), + 0x08 => Ok(Self::String), + 0x09 => Ok(Self::ULongString), + 0x0A => Ok(Self::SmallBytes), + 0x0B => Ok(Self::Bytes), + 0x0C => Ok(Self::ULongBytes), + 0x0D => Ok(Self::Float), + 0x0E => Ok(Self::Double), + 0xFF => Ok(Self::Invalid), + other => Err(data_conversion_error(format!( + "unknown RNTBD token type 0x{other:02X}" + ))), + } + } +} + +impl From for u8 { + fn from(value: TokenType) -> Self { + match value { + TokenType::Byte => 0x00, + TokenType::UShort => 0x01, + TokenType::ULong => 0x02, + TokenType::Long => 0x03, + TokenType::ULongLong => 0x04, + TokenType::LongLong => 0x05, + TokenType::Guid => 0x06, + TokenType::SmallString => 0x07, + TokenType::String => 0x08, + TokenType::ULongString => 0x09, + TokenType::SmallBytes => 0x0A, + TokenType::Bytes => 0x0B, + TokenType::ULongBytes => 0x0C, + TokenType::Float => 0x0D, + TokenType::Double => 0x0E, + TokenType::Invalid => 0xFF, + } + } +} + +/// A decoded RNTBD metadata token value. +/// +/// The enum variant determines the value codec used on the wire. +#[derive(Clone, Debug, PartialEq)] +pub(crate) enum TokenValue { + /// Single unsigned byte. + Byte(u8), + /// Unsigned 16-bit integer. + UShort(u16), + /// Unsigned 32-bit integer. + ULong(u32), + /// Signed 32-bit integer. + Long(i32), + /// Unsigned 64-bit integer. + ULongLong(u64), + /// Signed 64-bit integer. + LongLong(i64), + /// UUID in Microsoft GUID token byte order. + Guid(Uuid), + /// UTF-8 string with an unsigned byte length prefix. + SmallString(String), + /// UTF-8 string with an unsigned 16-bit length prefix. + String(String), + /// UTF-8 string with an unsigned 32-bit length prefix. + ULongString(String), + /// Bytes with an unsigned byte length prefix. + SmallBytes(Vec), + /// Bytes with an unsigned 16-bit length prefix. + Bytes(Vec), + /// Bytes with an unsigned 32-bit length prefix. + ULongBytes(Vec), + /// 32-bit floating point value. + Float(f32), + /// 64-bit floating point value. + Double(f64), +} + +impl TokenValue { + fn token_type(&self) -> TokenType { + match self { + Self::Byte(_) => TokenType::Byte, + Self::UShort(_) => TokenType::UShort, + Self::ULong(_) => TokenType::ULong, + Self::Long(_) => TokenType::Long, + Self::ULongLong(_) => TokenType::ULongLong, + Self::LongLong(_) => TokenType::LongLong, + Self::Guid(_) => TokenType::Guid, + Self::SmallString(_) => TokenType::SmallString, + Self::String(_) => TokenType::String, + Self::ULongString(_) => TokenType::ULongString, + Self::SmallBytes(_) => TokenType::SmallBytes, + Self::Bytes(_) => TokenType::Bytes, + Self::ULongBytes(_) => TokenType::ULongBytes, + Self::Float(_) => TokenType::Float, + Self::Double(_) => TokenType::Double, + } + } + + fn encoded_len(&self) -> usize { + match self { + Self::Byte(_) => 1, + Self::UShort(_) => 2, + Self::ULong(_) | Self::Long(_) | Self::Float(_) => 4, + Self::ULongLong(_) | Self::LongLong(_) | Self::Double(_) => 8, + Self::Guid(_) => 16, + Self::SmallString(value) => 1 + value.len(), + Self::String(value) => 2 + value.len(), + Self::ULongString(value) => 4 + value.len(), + Self::SmallBytes(value) => 1 + value.len(), + Self::Bytes(value) => 2 + value.len(), + Self::ULongBytes(value) => 4 + value.len(), + } + } + + fn write_to(&self, out: &mut Vec) -> azure_core::Result<()> { + match self { + Self::Byte(value) => out.push(*value), + Self::UShort(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::ULong(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::Long(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::ULongLong(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::LongLong(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::Guid(value) => write_guid_ms(out, *value), + Self::SmallString(value) => write_len_prefixed_u8(out, value.as_bytes())?, + Self::String(value) => write_len_prefixed_u16(out, value.as_bytes())?, + Self::ULongString(value) => write_len_prefixed_u32(out, value.as_bytes())?, + Self::SmallBytes(value) => write_len_prefixed_u8(out, value)?, + Self::Bytes(value) => write_len_prefixed_u16(out, value)?, + Self::ULongBytes(value) => write_len_prefixed_u32(out, value)?, + Self::Float(value) => out.extend_from_slice(&value.to_le_bytes()), + Self::Double(value) => out.extend_from_slice(&value.to_le_bytes()), + } + Ok(()) + } + + fn read_from(token_type: TokenType, src: &mut &[u8]) -> azure_core::Result { + match token_type { + TokenType::Byte => Ok(Self::Byte(read_u8(src)?)), + TokenType::UShort => Ok(Self::UShort(read_u16_le(src)?)), + TokenType::ULong => Ok(Self::ULong(read_u32_le(src)?)), + TokenType::Long => Ok(Self::Long(read_i32_le(src)?)), + TokenType::ULongLong => Ok(Self::ULongLong(read_u64_le(src)?)), + TokenType::LongLong => Ok(Self::LongLong(read_i64_le(src)?)), + TokenType::Guid => Ok(Self::Guid(read_guid_ms(src)?)), + TokenType::SmallString => { + let len = read_u8(src)? as usize; + Ok(Self::SmallString(read_utf8(src, len)?)) + } + TokenType::String => { + let len = read_u16_le(src)? as usize; + Ok(Self::String(read_utf8(src, len)?)) + } + TokenType::ULongString => { + let len = read_u32_le(src)? as usize; + Ok(Self::ULongString(read_utf8(src, len)?)) + } + TokenType::SmallBytes => { + let len = read_u8(src)? as usize; + Ok(Self::SmallBytes( + read_exact(src, len, "small bytes")?.to_vec(), + )) + } + TokenType::Bytes => { + let len = read_u16_le(src)? as usize; + Ok(Self::Bytes(read_exact(src, len, "bytes")?.to_vec())) + } + TokenType::ULongBytes => { + let len = read_u32_le(src)? as usize; + Ok(Self::ULongBytes( + read_exact(src, len, "ulong bytes")?.to_vec(), + )) + } + TokenType::Float => Ok(Self::Float(f32::from_le_bytes(read_array(src)?))), + TokenType::Double => Ok(Self::Double(f64::from_le_bytes(read_array(src)?))), + TokenType::Invalid => Err(data_conversion_error( + "invalid RNTBD token type sentinel encountered", + )), + } + } +} + +/// A single RNTBD metadata token. +/// +/// Tokens are encoded as a two-byte token ID, a one-byte [`TokenType`], and the +/// value bytes for that type. +#[derive(Clone, Debug, PartialEq)] +pub(crate) struct Token { + /// Token identifier from the RNTBD header table. + pub(crate) id: u16, + /// Decoded token value. + pub(crate) value: TokenValue, +} + +impl Token { + /// Creates a metadata token from an ID and typed value. + pub(crate) fn new(id: u16, value: TokenValue) -> Self { + Self { id, value } + } + + /// Returns the number of bytes this token occupies on the wire. + pub(super) fn encoded_len(&self) -> usize { + 2 + 1 + self.value.encoded_len() + } + + /// Writes this token to the output buffer. + /// + /// Returns an error if the token value exceeds the wire encoding's length + /// limits (e.g., a `SmallString` longer than 255 bytes). + pub(super) fn write_to(&self, out: &mut Vec) -> azure_core::Result<()> { + out.extend_from_slice(&self.id.to_le_bytes()); + out.push(self.value.token_type().into()); + self.value.write_to(out) + } + + /// Reads a token from the input slice and advances the slice. + pub(super) fn read_from(src: &mut &[u8]) -> azure_core::Result { + let id = read_u16_le(src)?; + let token_type = TokenType::try_from(read_u8(src)?)?; + let value = TokenValue::read_from(token_type, src)?; + Ok(Self { id, value }) + } +} + +/// RNTBD response metadata token IDs recognized by Slice 1. +pub(super) enum RntbdResponseToken { + /// Continuation token. + ContinuationToken, + /// Entity tag. + ETag, + /// Retry-after delay in milliseconds. + RetryAfterMilliseconds, + /// Logical sequence number. + Lsn, + /// Request charge in request units. + RequestCharge, + /// Owner full name. + OwnerFullName, + /// Cosmos DB sub-status code. + SubStatus, + /// Partition key range identifier. + PartitionKeyRangeId, + /// Item logical sequence number. + ItemLsn, + /// Global committed logical sequence number. + GlobalCommittedLsn, + /// Transport request identifier. + TransportRequestId, + /// Session token. + SessionToken, +} + +impl TryFrom for RntbdResponseToken { + type Error = (); + + fn try_from(value: u16) -> Result { + match value { + 0x0003 => Ok(Self::ContinuationToken), + 0x0004 => Ok(Self::ETag), + 0x000C => Ok(Self::RetryAfterMilliseconds), + 0x0013 => Ok(Self::Lsn), + 0x0015 => Ok(Self::RequestCharge), + 0x0017 => Ok(Self::OwnerFullName), + 0x001C => Ok(Self::SubStatus), + 0x0021 => Ok(Self::PartitionKeyRangeId), + 0x0032 => Ok(Self::ItemLsn), + 0x0029 => Ok(Self::GlobalCommittedLsn), + 0x0035 => Ok(Self::TransportRequestId), + 0x003E => Ok(Self::SessionToken), + _ => Err(()), + } + } +} + +/// RNTBD resource type wire ID. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(super) struct RntbdResourceType(u16); + +impl RntbdResourceType { + /// Returns the underlying RNTBD resource type ID. + pub(super) fn value(self) -> u16 { + self.0 + } +} + +impl From for RntbdResourceType { + fn from(value: ResourceType) -> Self { + let id = match value { + ResourceType::DatabaseAccount => 0x0014, + ResourceType::Database => 0x0001, + ResourceType::DocumentCollection => 0x0002, + ResourceType::Document => 0x0003, + ResourceType::StoredProcedure => 0x0007, + ResourceType::Trigger => 0x0009, + ResourceType::UserDefinedFunction => 0x000A, + ResourceType::PartitionKeyRange => 0x0016, + ResourceType::Offer => 0x000F, + }; + Self(id) + } +} + +impl TryFrom for RntbdResourceType { + type Error = azure_core::Error; + + fn try_from(value: u16) -> azure_core::Result { + match value { + 0x0014 | 0x0001 | 0x0002 | 0x0003 | 0x0007 | 0x0009 | 0x000A | 0x0016 | 0x000F => { + Ok(Self(value)) + } + other => Err(data_conversion_error(format!( + "unknown RNTBD resource type 0x{other:04X}" + ))), + } + } +} + +impl TryFrom for ResourceType { + type Error = azure_core::Error; + + fn try_from(value: RntbdResourceType) -> azure_core::Result { + match value.0 { + 0x0014 => Ok(Self::DatabaseAccount), + 0x0001 => Ok(Self::Database), + 0x0002 => Ok(Self::DocumentCollection), + 0x0003 => Ok(Self::Document), + 0x0007 => Ok(Self::StoredProcedure), + 0x0009 => Ok(Self::Trigger), + 0x000A => Ok(Self::UserDefinedFunction), + 0x0016 => Ok(Self::PartitionKeyRange), + 0x000F => Ok(Self::Offer), + _ => Err(data_conversion_error("unknown RNTBD resource type")), + } + } +} + +/// RNTBD operation type wire ID. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(super) struct RntbdOperationType(u16); + +impl RntbdOperationType { + /// Returns the underlying RNTBD operation type ID. + pub(super) fn value(self) -> u16 { + self.0 + } +} + +impl From for RntbdOperationType { + fn from(value: OperationType) -> Self { + let id = match value { + OperationType::Create => 0x0001, + OperationType::Read => 0x0003, + OperationType::ReadFeed => 0x0004, + OperationType::Delete => 0x0005, + OperationType::Replace => 0x0006, + OperationType::Execute => 0x0008, + OperationType::SqlQuery => 0x0009, + // QueryPlan has no distinct Gateway 2.0 wire ID; Java encodes it as SqlQuery + // with additional metadata that lands in a later slice. + OperationType::QueryPlan => 0x0009, + OperationType::Query => 0x000F, + OperationType::Head => 0x0011, + OperationType::HeadFeed => 0x0012, + OperationType::Upsert => 0x0013, + OperationType::Batch => 0x0025, + }; + Self(id) + } +} + +impl TryFrom for RntbdOperationType { + type Error = azure_core::Error; + + fn try_from(value: u16) -> azure_core::Result { + match value { + 0x0001 | 0x0003 | 0x0004 | 0x0005 | 0x0006 | 0x0008 | 0x0009 | 0x000F | 0x0011 + | 0x0012 | 0x0013 | 0x0025 => Ok(Self(value)), + other => Err(data_conversion_error(format!( + "unknown RNTBD operation type 0x{other:04X}" + ))), + } + } +} + +impl TryFrom for OperationType { + type Error = azure_core::Error; + + fn try_from(value: RntbdOperationType) -> azure_core::Result { + match value.0 { + 0x0001 => Ok(Self::Create), + 0x0003 => Ok(Self::Read), + 0x0004 => Ok(Self::ReadFeed), + 0x0005 => Ok(Self::Delete), + 0x0006 => Ok(Self::Replace), + 0x0008 => Ok(Self::Execute), + 0x0009 => Ok(Self::SqlQuery), + 0x000F => Ok(Self::Query), + 0x0011 => Ok(Self::Head), + 0x0012 => Ok(Self::HeadFeed), + 0x0013 => Ok(Self::Upsert), + 0x0025 => Ok(Self::Batch), + _ => Err(data_conversion_error("unknown RNTBD operation type")), + } + } +} + +/// Creates a data-conversion error for malformed RNTBD input. +pub(super) fn data_conversion_error(message: impl Into) -> azure_core::Error { + azure_core::Error::with_message(ErrorKind::DataConversion, message.into()) +} + +/// Writes a UUID using the Gateway 2.0 activity ID byte order. +/// +/// The wire form is the UUID most-significant 64 bits in little-endian order +/// followed by the least-significant 64 bits in little-endian order. +pub(super) fn write_uuid_le(out: &mut Vec, id: Uuid) { + let value = id.as_u128(); + let msb = (value >> 64) as u64; + let lsb = value as u64; + out.extend_from_slice(&msb.to_le_bytes()); + out.extend_from_slice(&lsb.to_le_bytes()); +} + +/// Reads a UUID using the Gateway 2.0 activity ID byte order. +pub(super) fn read_uuid_le(src: &mut &[u8]) -> azure_core::Result { + let msb = read_u64_le(src)?; + let lsb = read_u64_le(src)?; + Ok(Uuid::from_u128(((msb as u128) << 64) | lsb as u128)) +} + +/// Reads an unsigned byte from the input slice. +pub(super) fn read_u8(src: &mut &[u8]) -> azure_core::Result { + Ok(read_exact(src, 1, "u8")?[0]) +} + +/// Reads an unsigned 16-bit little-endian integer from the input slice. +pub(super) fn read_u16_le(src: &mut &[u8]) -> azure_core::Result { + Ok(u16::from_le_bytes(read_array(src)?)) +} + +/// Reads an unsigned 32-bit little-endian integer from the input slice. +pub(super) fn read_u32_le(src: &mut &[u8]) -> azure_core::Result { + Ok(u32::from_le_bytes(read_array(src)?)) +} + +/// Reads an unsigned 64-bit little-endian integer from the input slice. +pub(super) fn read_u64_le(src: &mut &[u8]) -> azure_core::Result { + Ok(u64::from_le_bytes(read_array(src)?)) +} + +fn read_i32_le(src: &mut &[u8]) -> azure_core::Result { + Ok(i32::from_le_bytes(read_array(src)?)) +} + +fn read_i64_le(src: &mut &[u8]) -> azure_core::Result { + Ok(i64::from_le_bytes(read_array(src)?)) +} + +fn read_array(src: &mut &[u8]) -> azure_core::Result<[u8; N]> { + let bytes = read_exact(src, N, "fixed-width value")?; + let mut out = [0_u8; N]; + out.copy_from_slice(bytes); + Ok(out) +} + +fn read_exact<'a>(src: &mut &'a [u8], len: usize, context: &str) -> azure_core::Result<&'a [u8]> { + if src.len() < len { + return Err(data_conversion_error(format!( + "RNTBD {context} needs {len} bytes but only {} remain", + src.len() + ))); + } + let (head, tail) = src.split_at(len); + *src = tail; + Ok(head) +} + +fn read_utf8(src: &mut &[u8], len: usize) -> azure_core::Result { + let bytes = read_exact(src, len, "UTF-8 string")?; + String::from_utf8(bytes.to_vec()) + .map_err(|e| azure_core::Error::new(ErrorKind::DataConversion, e)) +} + +fn write_len_prefixed_u8(out: &mut Vec, bytes: &[u8]) -> azure_core::Result<()> { + let len = u8::try_from(bytes.len()).map_err(|_| { + data_conversion_error(format!( + "RNTBD value length {} exceeds u8 length-prefix maximum (255)", + bytes.len() + )) + })?; + out.push(len); + out.extend_from_slice(bytes); + Ok(()) +} + +fn write_len_prefixed_u16(out: &mut Vec, bytes: &[u8]) -> azure_core::Result<()> { + let len = u16::try_from(bytes.len()).map_err(|_| { + data_conversion_error(format!( + "RNTBD value length {} exceeds u16 length-prefix maximum (65535)", + bytes.len() + )) + })?; + out.extend_from_slice(&len.to_le_bytes()); + out.extend_from_slice(bytes); + Ok(()) +} + +fn write_len_prefixed_u32(out: &mut Vec, bytes: &[u8]) -> azure_core::Result<()> { + let len = u32::try_from(bytes.len()).map_err(|_| { + data_conversion_error(format!( + "RNTBD value length {} exceeds u32 length-prefix maximum (4294967295)", + bytes.len() + )) + })?; + out.extend_from_slice(&len.to_le_bytes()); + out.extend_from_slice(bytes); + Ok(()) +} + +fn write_guid_ms(out: &mut Vec, id: Uuid) { + let (data1, data2, data3, data4) = id.as_fields(); + out.extend_from_slice(&data1.to_le_bytes()); + out.extend_from_slice(&data2.to_le_bytes()); + out.extend_from_slice(&data3.to_le_bytes()); + out.extend_from_slice(data4); +} + +fn read_guid_ms(src: &mut &[u8]) -> azure_core::Result { + let data1 = u32::from_le_bytes(read_array(src)?); + let data2 = u16::from_le_bytes(read_array(src)?); + let data3 = u16::from_le_bytes(read_array(src)?); + let data4 = read_array(src)?; + Ok(Uuid::from_fields(data1, data2, data3, &data4)) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn activity_id_uses_msb_lsb_little_endian_order() { + let id = Uuid::parse_str("0a1b2c3d-4e5f-6789-abcd-ef0123456789").unwrap(); + let mut bytes = Vec::new(); + + write_uuid_le(&mut bytes, id); + + assert_eq!( + bytes, + vec![ + 0x89, 0x67, 0x5f, 0x4e, 0x3d, 0x2c, 0x1b, 0x0a, 0x89, 0x67, 0x45, 0x23, 0x01, 0xef, + 0xcd, 0xab, + ] + ); + + let mut src = bytes.as_slice(); + let decoded = read_uuid_le(&mut src).unwrap(); + assert_eq!(decoded, id); + assert!(src.is_empty()); + } + + #[test] + fn invalid_token_type_sentinel_is_rejected() { + let mut src = [0x01, 0x00, 0xFF].as_slice(); + + let err = Token::read_from(&mut src).unwrap_err(); + + assert_eq!(*err.kind(), ErrorKind::DataConversion); + } + + #[test] + fn small_string_rejects_length_past_remaining_buffer() { + let mut src = [0x01, 0x00, 0x07, 0x05, b'h', b'i'].as_slice(); + + let err = Token::read_from(&mut src).unwrap_err(); + + assert_eq!(*err.kind(), ErrorKind::DataConversion); + } +} From 802e4795be7031162546f32d627900a8fe116fa1 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 20:34:07 -0700 Subject: [PATCH 26/48] Add Gateway 2.0 foundation (Slice 2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Lays the foundation for Gateway 2.0 request handling without yet routing any traffic through it. Three deliverables: * **Eligibility helper** — pure `is_operation_supported_by_gateway20` that returns `true` only for Document × {Create, Read, Replace, Upsert, Delete, Query, SqlQuery, QueryPlan, ReadFeed, Batch}. Both the outer `ResourceType` and inner `OperationType` matches are exhaustive (no wildcard arms) so any new enum variant is a compile-time error, forcing an explicit eligibility decision rather than a silent fail-closed default. * **Account-name extraction** — `AccountEndpoint::global_database_account_name` parses the host's first label and returns it for Cosmos endpoints (`*.documents.azure.{com,us,cn}`). Returns `None` for the emulator, IPv4/IPv6 literals, and custom domains; Slice 3 will read this when emitting the RNTBD `GlobalDatabaseAccountName` token. * **Constants relocation** — new `azure_data_cosmos_driver::constants` owns the canonical `x-ms-thinclient-*` and `x-ms-effective-partition-key` wire strings under `GATEWAY20_*` identifiers. The SDK's `THINCLIENT_PROXY_OPERATION_TYPE` / `_RESOURCE_TYPE` are now `#[deprecated]` re-exports of the new driver constants, preserving public API while migrating consumers. `COSMOS_ALLOWED_HEADERS` is extended to keep logging behavior unchanged. Helper and account-name accessor are intentionally `#[allow(dead_code)]` in this slice — Slice 3 wires them into the dispatch path. No routing, body-wrapping, header-injection, or response-unwrap changes here. Tests: +5 unit tests (constants pinning + distinctness, eligibility matrix exhaustiveness, stored-proc explicit ineligibility, host extraction table). Total 721 lib tests (was 716), all passing. Validation: cargo fmt, cargo clippy --all-features --all-targets -- -D warnings, cargo doc --no-deps, and cargo test --all-features all clean for both azure_data_cosmos and azure_data_cosmos_driver. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos/src/constants.rs | 25 +++- .../azure_data_cosmos_driver/src/constants.rs | 108 ++++++++++++++ .../driver/transport/gateway20_eligibility.rs | 141 ++++++++++++++++++ .../src/driver/transport/mod.rs | 2 + .../azure_data_cosmos_driver/src/lib.rs | 1 + .../src/models/account_reference.rs | 56 +++++++ 6 files changed, 330 insertions(+), 3 deletions(-) create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/constants.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_eligibility.rs diff --git a/sdk/cosmos/azure_data_cosmos/src/constants.rs b/sdk/cosmos/azure_data_cosmos/src/constants.rs index 42aaaede32a..d4ddddd0283 100644 --- a/sdk/cosmos/azure_data_cosmos/src/constants.rs +++ b/sdk/cosmos/azure_data_cosmos/src/constants.rs @@ -22,6 +22,8 @@ macro_rules! cosmos_headers { /// A list of all Cosmos DB specific headers that should be allowed in logging. pub const COSMOS_ALLOWED_HEADERS: &[&HeaderName] = &[ $(&$name,)* + &azure_data_cosmos_driver::constants::GATEWAY20_OPERATION_TYPE, + &azure_data_cosmos_driver::constants::GATEWAY20_RESOURCE_TYPE, ]; }; } @@ -189,9 +191,6 @@ cosmos_headers! { COSMOS_QUORUM_ACKED_LLSN => "x-ms-cosmos-quorum-acked-llsn", REQUEST_DURATION_MS => "x-ms-request-duration-ms", COSMOS_INTERNAL_PARTITION_ID => "x-ms-cosmos-internal-partition-id", - // Thin Client - THINCLIENT_PROXY_OPERATION_TYPE => "x-ms-thinclient-proxy-operation-type", - THINCLIENT_PROXY_RESOURCE_TYPE => "x-ms-thinclient-proxy-resource-type", // Client ID CLIENT_ID => "x-ms-client-id", // these are not actually sent but are used internally for fault injection @@ -199,6 +198,26 @@ cosmos_headers! { FAULT_INJECTION_CONTAINER_ID => "x-ms-fault-injection-container-id", } +/// Deprecated alias for the Gateway 2.0 proxy operation-type header. +/// +/// Use the driver-level `GATEWAY20_OPERATION_TYPE` constant for new code. +#[deprecated( + since = "0.33.0", + note = "Use `azure_data_cosmos_driver::constants::GATEWAY20_OPERATION_TYPE` instead." +)] +pub const THINCLIENT_PROXY_OPERATION_TYPE: HeaderName = + azure_data_cosmos_driver::constants::GATEWAY20_OPERATION_TYPE; + +/// Deprecated alias for the Gateway 2.0 proxy resource-type header. +/// +/// Use the driver-level `GATEWAY20_RESOURCE_TYPE` constant for new code. +#[deprecated( + since = "0.33.0", + note = "Use `azure_data_cosmos_driver::constants::GATEWAY20_RESOURCE_TYPE` instead." +)] +pub const THINCLIENT_PROXY_RESOURCE_TYPE: HeaderName = + azure_data_cosmos_driver::constants::GATEWAY20_RESOURCE_TYPE; + pub const QUERY_CONTENT_TYPE: ContentType = ContentType::from_static("application/query+json"); pub(crate) const PREFER_MINIMAL: HeaderValue = HeaderValue::from_static("return=minimal"); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs b/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs new file mode 100644 index 00000000000..e4804d4df75 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs @@ -0,0 +1,108 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +// Don't spell-check header names (which should start with 'x-'). +// cSpell:disable + +//! Driver-level Cosmos DB constants. +//! +//! This module owns the canonical wire-name strings for the Gateway 2.0 +//! HTTP/2 outer headers. The wire strings retain the historical +//! `x-ms-thinclient-*` form because the proxy is server-defined; only the +//! Rust identifier follows the `GATEWAY20_*` naming convention. + +use azure_core::http::headers::HeaderName; + +/// Gateway 2.0 proxy operation-type header. +/// +/// Contains the numeric operation type on every Gateway 2.0 request. +pub const GATEWAY20_OPERATION_TYPE: HeaderName = + HeaderName::from_static("x-ms-thinclient-proxy-operation-type"); + +/// Gateway 2.0 proxy resource-type header. +/// +/// Contains the numeric resource type on every Gateway 2.0 request. +pub const GATEWAY20_RESOURCE_TYPE: HeaderName = + HeaderName::from_static("x-ms-thinclient-proxy-resource-type"); + +/// Effective Partition Key header. +/// +/// Sent for point Document operations only. +pub const EFFECTIVE_PARTITION_KEY: HeaderName = + HeaderName::from_static("x-ms-effective-partition-key"); + +/// Lower bound of the EPK range. +/// +/// Sent for feed and cross-partition operations only. +pub const GATEWAY20_RANGE_MIN: HeaderName = HeaderName::from_static("x-ms-thinclient-range-min"); + +/// Upper bound of the EPK range. +/// +/// Sent for feed and cross-partition operations only. +pub const GATEWAY20_RANGE_MAX: HeaderName = HeaderName::from_static("x-ms-thinclient-range-max"); + +/// Account-metadata fetch hint. +/// +/// Instructs the response to advertise thin-client endpoints. +pub const GATEWAY20_USE_THINCLIENT: HeaderName = + HeaderName::from_static("x-ms-cosmos-use-thinclient"); + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn constants_match_expected_wire_strings() { + let cases = [ + ( + GATEWAY20_OPERATION_TYPE, + HeaderName::from_static("x-ms-thinclient-proxy-operation-type"), + ), + ( + GATEWAY20_RESOURCE_TYPE, + HeaderName::from_static("x-ms-thinclient-proxy-resource-type"), + ), + ( + EFFECTIVE_PARTITION_KEY, + HeaderName::from_static("x-ms-effective-partition-key"), + ), + ( + GATEWAY20_RANGE_MIN, + HeaderName::from_static("x-ms-thinclient-range-min"), + ), + ( + GATEWAY20_RANGE_MAX, + HeaderName::from_static("x-ms-thinclient-range-max"), + ), + ( + GATEWAY20_USE_THINCLIENT, + HeaderName::from_static("x-ms-cosmos-use-thinclient"), + ), + ]; + + for (actual, expected) in cases { + assert_eq!(actual, expected); + } + } + + #[test] + fn constants_have_distinct_wire_strings() { + let constants = [ + ("GATEWAY20_OPERATION_TYPE", GATEWAY20_OPERATION_TYPE), + ("GATEWAY20_RESOURCE_TYPE", GATEWAY20_RESOURCE_TYPE), + ("EFFECTIVE_PARTITION_KEY", EFFECTIVE_PARTITION_KEY), + ("GATEWAY20_RANGE_MIN", GATEWAY20_RANGE_MIN), + ("GATEWAY20_RANGE_MAX", GATEWAY20_RANGE_MAX), + ("GATEWAY20_USE_THINCLIENT", GATEWAY20_USE_THINCLIENT), + ]; + + for (index, (left_name, left_header)) in constants.iter().enumerate() { + for (right_name, right_header) in constants.iter().skip(index + 1) { + assert_ne!( + left_header, right_header, + "{left_name} and {right_name} must not share a wire string" + ); + } + } + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_eligibility.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_eligibility.rs new file mode 100644 index 00000000000..508dc79d6fa --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_eligibility.rs @@ -0,0 +1,141 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! Gateway 2.0 operation eligibility filter. + +use crate::models::{OperationType, ResourceType}; + +/// Returns `true` when the resource and operation pair is eligible for Gateway 2.0. +/// +/// Only `ResourceType::Document` is currently eligible, matching Java's +/// `ThinClientStoreModel`. Stored-procedure execution is explicitly out of +/// scope for Rust SDK GA; every non-Document resource type falls back to +/// standard Gateway via the eligibility-fallback path. +/// +/// `OperationType::Patch` is not currently a variant on the Rust enum and is +/// therefore not handled here. When the variant is added in a future slice, +/// this match must be updated. +// Slice 3 wires this helper into routing. +#[allow(dead_code)] +pub(crate) fn is_operation_supported_by_gateway20( + resource_type: ResourceType, + operation_type: OperationType, +) -> bool { + // Both arms of this match are intentionally exhaustive (no wildcard `_` arm) so + // that adding a new variant to either enum is a compile-time error, forcing an + // explicit eligibility decision rather than a silent fail-closed default. + match resource_type { + ResourceType::Document => match operation_type { + OperationType::Create + | OperationType::Read + | OperationType::Replace + | OperationType::Upsert + | OperationType::Delete + | OperationType::Query + | OperationType::SqlQuery + | OperationType::QueryPlan + | OperationType::ReadFeed + | OperationType::Batch => true, + OperationType::Head | OperationType::HeadFeed | OperationType::Execute => false, + }, + ResourceType::DatabaseAccount + | ResourceType::Database + | ResourceType::DocumentCollection + | ResourceType::StoredProcedure + | ResourceType::Trigger + | ResourceType::UserDefinedFunction + | ResourceType::PartitionKeyRange + | ResourceType::Offer => false, + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn all_resource_types() -> [ResourceType; 9] { + [ + ResourceType::DatabaseAccount, + ResourceType::Database, + ResourceType::DocumentCollection, + ResourceType::Document, + ResourceType::StoredProcedure, + ResourceType::Trigger, + ResourceType::UserDefinedFunction, + ResourceType::PartitionKeyRange, + ResourceType::Offer, + ] + } + + fn all_operation_types() -> [OperationType; 13] { + [ + OperationType::Create, + OperationType::Read, + OperationType::ReadFeed, + OperationType::Replace, + OperationType::Delete, + OperationType::Upsert, + OperationType::Query, + OperationType::SqlQuery, + OperationType::QueryPlan, + OperationType::Batch, + OperationType::Head, + OperationType::HeadFeed, + OperationType::Execute, + ] + } + + fn expected_gateway20_eligibility( + resource_type: ResourceType, + operation_type: OperationType, + ) -> bool { + match resource_type { + ResourceType::Document => match operation_type { + OperationType::Create + | OperationType::Read + | OperationType::Replace + | OperationType::Upsert + | OperationType::Delete + | OperationType::Query + | OperationType::SqlQuery + | OperationType::QueryPlan + | OperationType::ReadFeed + | OperationType::Batch => true, + OperationType::Head | OperationType::HeadFeed | OperationType::Execute => false, + }, + ResourceType::DatabaseAccount + | ResourceType::Database + | ResourceType::DocumentCollection + | ResourceType::StoredProcedure + | ResourceType::Trigger + | ResourceType::UserDefinedFunction + | ResourceType::PartitionKeyRange + | ResourceType::Offer => false, + } + } + + #[test] + fn gateway20_eligibility_matrix_is_exhaustive() { + for resource_type in all_resource_types() { + for operation_type in all_operation_types() { + assert_eq!( + is_operation_supported_by_gateway20(resource_type, operation_type), + expected_gateway20_eligibility(resource_type, operation_type), + "unexpected Gateway 2.0 eligibility for {resource_type:?} {operation_type:?}" + ); + } + } + } + + #[test] + fn stored_procedure_execution_is_explicitly_ineligible() { + assert!(!is_operation_supported_by_gateway20( + ResourceType::StoredProcedure, + OperationType::Execute + )); + assert!(!is_operation_supported_by_gateway20( + ResourceType::Document, + OperationType::Execute + )); + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs index 7c34dc53576..c4e10653d3a 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs @@ -20,6 +20,8 @@ pub(crate) mod background_task_manager; pub(crate) mod cosmos_headers; pub(crate) mod cosmos_transport_client; mod emulator; +/// Gateway 2.0 operation eligibility filter. +pub(crate) mod gateway20_eligibility; pub(crate) mod http_client_factory; pub(crate) mod request_signing; #[cfg(feature = "reqwest")] diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/lib.rs b/sdk/cosmos/azure_data_cosmos_driver/src/lib.rs index 978f929bdc3..15c0537d577 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/lib.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/lib.rs @@ -20,6 +20,7 @@ //! raw bytes (`&[u8]`) and return buffered responses (`Vec`). Serialization is handled by //! the consuming SDK in its native language. +pub mod constants; pub mod diagnostics; pub mod driver; #[cfg(feature = "fault_injection")] diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs b/sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs index 5f32b3d9ba9..0237847e26f 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs @@ -34,6 +34,35 @@ impl AccountEndpoint { self.0.host_str().unwrap_or("") } + /// Returns the global database account name parsed from the endpoint hostname's first label. + /// + /// Returns `None` for emulator, IP literal, and custom-domain hosts. The parsed value is used + /// as the RNTBD `GlobalDatabaseAccountName` metadata token on Gateway 2.0 requests; when it + /// cannot be parsed, Gateway 2.0 requests fall back to standard Gateway for that account. + // Slice 3 reads this value when Gateway 2.0 dispatch is wired in. + #[allow(dead_code)] + pub(crate) fn global_database_account_name(&self) -> Option { + let host = self.host(); + if host.is_empty() { + return None; + } + + if host.starts_with(|c: char| c.is_ascii_digit()) || host.contains(':') { + return None; + } + + let (label, suffix) = host.split_once('.')?; + if label.is_empty() || suffix.is_empty() { + return None; + } + + if !suffix.starts_with("documents.") { + return None; + } + + Some(label.to_owned()) + } + /// Joins a resource path to this endpoint to create a full request URL. /// /// The path should be the resource path (e.g., "/dbs/mydb/colls/mycoll"). @@ -375,6 +404,33 @@ mod tests { assert_eq!(endpoint.host(), "myaccount.documents.azure.com"); } + #[test] + fn global_database_account_name_extracts_only_cosmos_hosts() { + let cases = [ + ("https://myaccount.documents.azure.com/", Some("myaccount")), + ( + "https://my-account-123.documents.azure.com/", + Some("my-account-123"), + ), + ("https://myacct.documents.azure.us/", Some("myacct")), + ("https://myacct.documents.azure.cn:443/", Some("myacct")), + ("https://localhost:8081/", None), + ("https://127.0.0.1:8081/", None), + ("https://[::1]:8081/", None), + ("https://my.custom.domain/", None), + ("https://example.com/", None), + ]; + + for (url, expected) in cases { + let endpoint = AccountEndpoint::try_from(url).unwrap(); + assert_eq!( + endpoint.global_database_account_name().as_deref(), + expected, + "unexpected account name for {url}" + ); + } + } + #[test] fn builder_with_master_key() { let account = From c475d879c932d585a3819d664c1a776345c4b7ee Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 21:41:32 -0700 Subject: [PATCH 27/48] Add Gateway 2.0 routing eligibility and endpoint key derivation (Slice 3a) Wires the Slice 2 eligibility helper into operation routing and fixes a latent connection-pool keying bug for Gateway 2.0 endpoints. Changes: - pipeline/components.rs: add endpoint_key field to RoutingDecision so the pool key is captured alongside the chosen URL rather than being recomputed from the underlying CosmosEndpoint cache. - pipeline/operation_pipeline.rs: - resolve_endpoint now considers Gateway 2.0 only when the endpoint advertises it AND the account name is parseable AND the operation is supported by Gateway 2.0 per is_operation_supported_by_gateway20. - Selected URL's authority is used to derive endpoint_key when routing through Gateway 2.0; otherwise the existing endpoint cache key is reused. - account_endpoint.global_database_account_name().is_some() is inlined at the call site; full Option threading is deferred to Slice 3b/c. - transport/mod.rs: re-export is_operation_supported_by_gateway20 for pipeline consumers. Tests: - Three new resolve_endpoint tests cover (1) ineligible operation falling back to Gateway, (2) missing account name falling back to Gateway, and (3) Gateway 2.0 routing producing an endpoint_key derived from the Gateway 2.0 authority (not the gateway1 cache). - Existing tests updated for the new resolve_endpoint signature and the new RoutingDecision field. Validation: cargo fmt, cargo clippy --all-features --all-targets -D warnings, cargo test -p azure_data_cosmos_driver --all-features --lib (724 passed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/driver/pipeline/components.rs | 4 +- .../src/driver/pipeline/operation_pipeline.rs | 181 +++++++++++++++++- .../src/driver/transport/mod.rs | 1 + 3 files changed, 175 insertions(+), 11 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs index 6a7479de9d6..2146b075b98 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs @@ -17,7 +17,7 @@ use crate::{ driver::{ jitter::with_jitter, routing::{CosmosEndpoint, LocationIndex}, - transport::AuthorizationContext, + transport::{AuthorizationContext, EndpointKey}, }, models::{CosmosResponseHeaders, CosmosStatus}, options::Region, @@ -47,6 +47,8 @@ pub(crate) struct RoutingDecision { pub endpoint: CosmosEndpoint, /// The concrete URL selected for this attempt. pub selected_url: Url, + /// The connection-pool key matching the selected URL's authority. + pub endpoint_key: EndpointKey, /// The transport mode for this attempt. pub transport_mode: TransportMode, } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs index d4cd938e5c0..41cdc4eab43 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs @@ -34,8 +34,9 @@ use super::{ }; use crate::driver::transport::{ + is_operation_supported_by_gateway20, transport_pipeline::{execute_transport_pipeline, TransportPipelineContext}, - AuthorizationContext, CosmosTransport, + AuthorizationContext, CosmosTransport, EndpointKey, }; /// Executes a Cosmos DB operation through the new pipeline architecture. @@ -117,6 +118,7 @@ pub(crate) async fn execute_operation_pipeline( &retry_state, &location, pipeline_type == PipelineType::DataPlane, + account_endpoint.global_database_account_name().is_some(), location_state_store.endpoint_unavailability_ttl(), ); @@ -199,7 +201,7 @@ pub(crate) async fn execute_operation_pipeline( user_agent, pipeline_type, transport_security, - endpoint_key: routing.endpoint.endpoint_key(), + endpoint_key: routing.endpoint_key.clone(), }, &mut diagnostics, ) @@ -337,6 +339,7 @@ fn resolve_endpoint( retry_state: &OperationRetryState, location: &LocationSnapshot, prefer_gateway20: bool, + account_name_present: bool, endpoint_unavailability_ttl: Duration, ) -> RoutingDecision { let account = location.account.as_ref(); @@ -373,16 +376,28 @@ fn resolve_endpoint( } let selected = selected.unwrap_or_else(|| account.default_endpoint.clone()); - let use_gateway20 = selected.uses_gateway20(prefer_gateway20); + let use_gateway20 = selected.uses_gateway20(prefer_gateway20) + && account_name_present + && is_operation_supported_by_gateway20( + operation.resource_type(), + operation.operation_type(), + ); let transport_mode = if use_gateway20 { TransportMode::Gateway20 } else { TransportMode::Gateway }; + let selected_url = selected.selected_url(use_gateway20).clone(); + let endpoint_key = if use_gateway20 { + EndpointKey::try_from(&selected_url).expect("selected URL must have a valid host and port") + } else { + selected.endpoint_key() + }; RoutingDecision { - selected_url: selected.selected_url(use_gateway20).clone(), + selected_url, endpoint: selected, + endpoint_key, transport_mode, } } @@ -632,6 +647,7 @@ mod tests { driver::{ pipeline::components::{RoutingDecision, TransportMode}, routing::{AccountEndpointState, CosmosEndpoint, LocationIndex, LocationSnapshot}, + transport::EndpointKey, }, models::{ request_header_names, AccountReference, ActivityId, ContainerProperties, @@ -676,6 +692,7 @@ mod tests { CosmosEndpoint::global(Url::parse("https://test.documents.azure.com:443/").unwrap()); RoutingDecision { selected_url: endpoint.url().clone(), + endpoint_key: endpoint.endpoint_key(), endpoint, transport_mode: TransportMode::Gateway, } @@ -777,13 +794,16 @@ mod tests { fn build_transport_request_uses_routed_endpoint_url_directly() { let operation = CosmosOperation::read_database(DatabaseReference::from_name(test_account(), "mydb")); + let selected_url = + Url::parse("https://test-westus2-thin.documents.azure.com:444/").unwrap(); let routing = RoutingDecision { endpoint: CosmosEndpoint::regional_with_gateway20( "westus2".into(), Url::parse("https://test-westus2.documents.azure.com:443/").unwrap(), - Url::parse("https://test-westus2-thin.documents.azure.com:444/").unwrap(), + selected_url.clone(), ), - selected_url: Url::parse("https://test-westus2-thin.documents.azure.com:444/").unwrap(), + endpoint_key: EndpointKey::try_from(&selected_url).unwrap(), + selected_url, transport_mode: TransportMode::Gateway20, }; @@ -809,11 +829,12 @@ mod tests { fn build_transport_request_uses_default_url_for_global_endpoint() { let operation = CosmosOperation::read_database(DatabaseReference::from_name(test_account(), "mydb")); + let endpoint = + CosmosEndpoint::global(Url::parse("https://test.documents.azure.com:443/").unwrap()); let routing = RoutingDecision { - endpoint: CosmosEndpoint::global( - Url::parse("https://test.documents.azure.com:443/").unwrap(), - ), - selected_url: Url::parse("https://test.documents.azure.com:443/").unwrap(), + selected_url: endpoint.url().clone(), + endpoint_key: endpoint.endpoint_key(), + endpoint, transport_mode: TransportMode::Gateway, }; @@ -873,6 +894,7 @@ mod tests { &retry_state, &location, false, + true, Duration::from_secs(60), ); assert_eq!(routing.endpoint, write_endpoint); @@ -923,6 +945,7 @@ mod tests { &retry_state, &location, false, + true, Duration::from_secs(60), ); assert_eq!(routing.endpoint, default_endpoint); @@ -971,6 +994,7 @@ mod tests { &retry_state, &location, false, + true, Duration::from_secs(60), ); assert_eq!(routing.endpoint, read_endpoint); @@ -1028,6 +1052,7 @@ mod tests { &stale_retry_state, &location, false, + true, Duration::from_secs(60), ); assert_eq!(first_routing.endpoint, endpoint_a); @@ -1041,6 +1066,7 @@ mod tests { &advanced_state, &location, false, + true, Duration::from_secs(60), ); assert_eq!(second_routing.endpoint, endpoint_b); @@ -1184,6 +1210,7 @@ mod tests { &retry_state, &location, true, + true, Duration::from_secs(60), ); assert_eq!(routing.endpoint, endpoint); @@ -1194,6 +1221,139 @@ mod tests { ); } + #[test] + fn resolve_endpoint_falls_back_to_gateway_when_op_ineligible_for_gateway20() { + let operation = CosmosOperation::read_all_databases(test_account()); + let endpoint = CosmosEndpoint::regional_with_gateway20( + "westus2".into(), + Url::parse("https://test-westus2.documents.azure.com:443/").unwrap(), + Url::parse("https://test-westus2-thin.documents.azure.com:444/").unwrap(), + ); + + let location = LocationSnapshot::for_tests(Arc::new(AccountEndpointState { + generation: 0, + preferred_read_endpoints: vec![endpoint.clone()].into(), + preferred_write_endpoints: vec![endpoint.clone()].into(), + unavailable_endpoints: Default::default(), + multiple_write_locations_enabled: false, + default_endpoint: endpoint.clone(), + })); + + let retry_state = crate::driver::pipeline::components::OperationRetryState::initial( + 0, + false, + Vec::new(), + 3, + 2, + ); + + let routing = super::resolve_endpoint( + &operation, + &retry_state, + &location, + true, + true, + Duration::from_secs(60), + ); + + assert_eq!(routing.transport_mode, TransportMode::Gateway); + assert_eq!(routing.selected_url, *endpoint.url()); + } + + #[test] + fn resolve_endpoint_falls_back_to_gateway_when_account_name_unparseable() { + let operation = CosmosOperation::read_item(ItemReference::from_name( + &test_container(), + PartitionKey::from("pk1"), + "doc1", + )); + let endpoint = CosmosEndpoint::regional_with_gateway20( + "westus2".into(), + Url::parse("https://test-westus2.documents.azure.com:443/").unwrap(), + Url::parse("https://test-westus2-thin.documents.azure.com:444/").unwrap(), + ); + + let location = LocationSnapshot::for_tests(Arc::new(AccountEndpointState { + generation: 0, + preferred_read_endpoints: vec![endpoint.clone()].into(), + preferred_write_endpoints: vec![endpoint.clone()].into(), + unavailable_endpoints: Default::default(), + multiple_write_locations_enabled: false, + default_endpoint: endpoint.clone(), + })); + + let retry_state = crate::driver::pipeline::components::OperationRetryState::initial( + 0, + false, + Vec::new(), + 3, + 2, + ); + + let routing = super::resolve_endpoint( + &operation, + &retry_state, + &location, + true, + false, + Duration::from_secs(60), + ); + + assert_eq!(routing.transport_mode, TransportMode::Gateway); + assert_eq!(routing.selected_url, *endpoint.url()); + } + + #[test] + fn resolve_endpoint_uses_gateway20_authority_for_endpoint_key() { + let operation = CosmosOperation::read_item(ItemReference::from_name( + &test_container(), + PartitionKey::from("pk1"), + "doc1", + )); + let gateway20_url = Url::parse("https://central.thinclient.azure.com:444/").unwrap(); + let endpoint = CosmosEndpoint::regional_with_gateway20( + "centralus".into(), + Url::parse("https://central.documents.azure.com:443/").unwrap(), + gateway20_url.clone(), + ); + + let location = LocationSnapshot::for_tests(Arc::new(AccountEndpointState { + generation: 0, + preferred_read_endpoints: vec![endpoint.clone()].into(), + preferred_write_endpoints: vec![endpoint.clone()].into(), + unavailable_endpoints: Default::default(), + multiple_write_locations_enabled: false, + default_endpoint: endpoint, + })); + + let retry_state = crate::driver::pipeline::components::OperationRetryState::initial( + 0, + false, + Vec::new(), + 3, + 2, + ); + + let routing = super::resolve_endpoint( + &operation, + &retry_state, + &location, + true, + true, + Duration::from_secs(60), + ); + + assert_eq!(routing.transport_mode, TransportMode::Gateway20); + assert_eq!( + routing.selected_url.host_str(), + Some("central.thinclient.azure.com") + ); + assert_eq!( + routing.endpoint_key, + EndpointKey::try_from(&gateway20_url).unwrap() + ); + } + #[test] fn resolve_endpoint_skips_unavailable_region_when_gateway20_is_present() { let operation = CosmosOperation::read_item(ItemReference::from_name( @@ -1242,6 +1402,7 @@ mod tests { &retry_state, &location, true, + true, Duration::from_secs(60), ); assert_eq!(routing.endpoint, fallback_endpoint); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs index c4e10653d3a..991fcde35f0 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs @@ -22,6 +22,7 @@ pub(crate) mod cosmos_transport_client; mod emulator; /// Gateway 2.0 operation eligibility filter. pub(crate) mod gateway20_eligibility; +pub(crate) use gateway20_eligibility::is_operation_supported_by_gateway20; pub(crate) mod http_client_factory; pub(crate) mod request_signing; #[cfg(feature = "reqwest")] From 696362660716b192213a1bb5284e9b8d65e7872f Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Wed, 29 Apr 2026 23:18:13 -0700 Subject: [PATCH 28/48] Add Gateway 2.0 RNTBD dispatch (Slice 3b/c) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final slice of Gateway 2.0 vertical-slice plan §5: wrap a signed Cosmos HTTP request as an RNTBD request frame on the way out, and decode the proxy's RNTBD response frame back into a synthetic `HttpResponse` on the way in. With Slice 3a's routing eligibility gating, point operations on fully-specified partition keys now flow end-to-end over Gateway 2.0 when the routing decision selects it. The wrap helper consumes a signed `HttpRequest`, reuses the exact `x-ms-date` and `Authorization` values written by `sign_request` (so signature verification still holds), packs the 11 RNTBD request tokens required by the proxy (Authorization, PayloadPresent, Date, ConsistencyLevel, DatabaseName, CollectionName, DocumentName, TransportRequestId, EffectivePartitionKey, SDKSupportedCapabilities, GlobalDatabaseAccountName), and returns a brand-new `HttpRequest` carrying just the outer `User-Agent` and `x-ms-activity-id` headers. The unwrap helper runs only on outer HTTP 200 — outer non-200 responses (proxy or transport errors) pass through unchanged so they surface as their actual status, not a synthesized `TRANSPORT_GENERATED_503`. Consistency resolution lives in a new `resolve_effective_consistency` helper that implements the precedence chain from spec §5.2 at operation-pipeline scope, then flows through `TransportRequest` so the wrap helper never inspects HTTP headers for consistency. Failure modes: - Wrap failure (missing signed header, malformed activity_id, bad resource link, missing account name) → `CLIENT_GENERATED_400` TransportError, `RequestSentStatus::NotSent`. Adds a new `SubStatusCode::CLIENT_GENERATED_400` (20400) mirroring the existing `CLIENT_GENERATED_401` pattern. - Unwrap failure (outer-200 with undecodable RNTBD body, or inner status outside 100..=599) → `TRANSPORT_GENERATED_503`, `RequestSentStatus::Sent`. - Outer non-200 → outer triple passes through unchanged, no unwrap. Files: - `transport/gateway20_dispatch.rs` (new): `WrapInputs`, `wrap_request_for_gateway20`, `unwrap_response_for_gateway20`, `parse_resource_names`, `effective_partition_key_bytes`, `next_transport_request_id` (process-wide AtomicU32), 13 unit tests. - `transport/rntbd/tokens.rs`: new `RntbdRequestToken` enum with 11 verified IDs and `Token::*` named constructors. IDs cross-referenced against Java's `RntbdConstants.RntbdRequestHeader`. - `transport/transport_pipeline.rs`: wrap call site between `sign_request` and `execute_http_attempt` (gated on `transport_mode == Gateway20`); unwrap call site inside `finalize_http_attempt`'s Response branch before `map_http_response_payload` (gated on `transport_mode == Gateway20` AND outer status == 200); new `gateway20_wrap_error_result`, `gateway20_unwrap_error_result`; `TransportPipelineContext` carries `account_name`; 6 new integration tests. - `pipeline/components.rs`: `TransportRequest` carries `transport_mode`, `operation_type`, `partition_key`, `partition_key_definition`, `effective_consistency`. - `pipeline/operation_pipeline.rs`: real `account_name: Option` binding, `effective_consistency` resolved at op-pipeline scope, all five new fields populated in `build_transport_request`. - `options/read_consistency.rs`: new `resolve_effective_consistency(strategy, account_default) -> DefaultConsistencyLevel` per spec §5.2, with 4×5 table test. - `models/cosmos_status.rs`: `SubStatusCode::CLIENT_GENERATED_400` (20400) and `CosmosStatus::CLIENT_GENERATED_400`. - `transport/cosmos_headers.rs`, `transport/mod.rs`, `options/mod.rs`: exports for the new helpers/constants. 20 new tests; 744 passed, 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/driver/pipeline/components.rs | 15 +- .../src/driver/pipeline/operation_pipeline.rs | 36 +- .../src/driver/transport/cosmos_headers.rs | 3 +- .../driver/transport/gateway20_dispatch.rs | 864 ++++++++++++++++++ .../src/driver/transport/mod.rs | 4 + .../src/driver/transport/rntbd/tokens.rs | 141 ++- .../driver/transport/transport_pipeline.rs | 523 ++++++++++- .../src/models/cosmos_status.rs | 14 + .../src/options/mod.rs | 1 + .../src/options/read_consistency.rs | 46 + 10 files changed, 1632 insertions(+), 15 deletions(-) create mode 100644 sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs index 2146b075b98..bc1b15e189a 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs @@ -19,7 +19,10 @@ use crate::{ routing::{CosmosEndpoint, LocationIndex}, transport::{AuthorizationContext, EndpointKey}, }, - models::{CosmosResponseHeaders, CosmosStatus}, + models::{ + CosmosResponseHeaders, CosmosStatus, DefaultConsistencyLevel, OperationType, PartitionKey, + PartitionKeyDefinition, + }, options::Region, }; @@ -179,6 +182,16 @@ pub(crate) struct TransportRequest { pub method: Method, /// The endpoint selected for this attempt. pub endpoint: CosmosEndpoint, + /// The routed transport mode for this attempt. + pub transport_mode: TransportMode, + /// The operation type being dispatched. + pub operation_type: OperationType, + /// Partition key for item-scoped Gateway 2.0 dispatch. + pub partition_key: Option, + /// Partition key definition for effective partition key computation. + pub partition_key_definition: Option, + /// Effective consistency resolved from account default and read options. + pub effective_consistency: DefaultConsistencyLevel, /// The fully resolved URL for this attempt. pub url: Url, /// Headers to send (includes operation-specific and attempt-specific headers). diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs index 41cdc4eab43..85c43c69dc8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs @@ -22,7 +22,10 @@ use crate::{ request_header_names, AccountEndpoint, ActivityId, CosmosOperation, CosmosResponse, Credential, DefaultConsistencyLevel, OperationType, SessionToken, SubStatusCode, }, - options::{OperationOptionsView, ReadConsistencyStrategy, ThroughputControlGroupSnapshot}, + options::{ + resolve_effective_consistency, OperationOptionsView, ReadConsistencyStrategy, + ThroughputControlGroupSnapshot, + }, }; use super::{ @@ -74,6 +77,8 @@ pub(crate) async fn execute_operation_pipeline( .read_consistency_strategy() .copied() .unwrap_or(ReadConsistencyStrategy::Default); + let effective_consistency = + resolve_effective_consistency(read_consistency_strategy, account_default_consistency); let session_consistency_active = !session_capturing_disabled && read_consistency_strategy.is_session_effective(account_default_consistency); let max_session_retries = options @@ -113,12 +118,13 @@ pub(crate) async fn execute_operation_pipeline( let location = location_state_store.snapshot(); // ── STAGE 2: Resolve endpoint ────────────────────────────────── + let account_name = account_endpoint.global_database_account_name(); let routing = resolve_endpoint( operation, &retry_state, &location, pipeline_type == PipelineType::DataPlane, - account_endpoint.global_database_account_name().is_some(), + account_name.is_some(), location_state_store.endpoint_unavailability_ttl(), ); @@ -139,6 +145,7 @@ pub(crate) async fn execute_operation_pipeline( activity_id, execution_context, deadline, + effective_consistency, resolved_session_token: session_consistency_active .then(|| { session_manager.resolve_session_token( @@ -202,6 +209,7 @@ pub(crate) async fn execute_operation_pipeline( pipeline_type, transport_security, endpoint_key: routing.endpoint_key.clone(), + account_name: account_name.clone(), }, &mut diagnostics, ) @@ -447,6 +455,7 @@ struct TransportRequestContext<'a> { activity_id: &'a ActivityId, execution_context: ExecutionContext, deadline: Option, + effective_consistency: DefaultConsistencyLevel, resolved_session_token: Option, throughput_control: Option<&'a ThroughputControlGroupSnapshot>, } @@ -560,6 +569,13 @@ fn build_transport_request( Ok(TransportRequest { method, endpoint: ctx.routing.endpoint.clone(), + transport_mode: ctx.routing.transport_mode, + operation_type: operation.operation_type(), + partition_key: operation.partition_key().cloned(), + partition_key_definition: operation + .container() + .map(|container| container.partition_key_definition().clone()), + effective_consistency: ctx.effective_consistency, url, headers, body: operation.body().map(azure_core::Bytes::copy_from_slice), @@ -651,8 +667,9 @@ mod tests { }, models::{ request_header_names, AccountReference, ActivityId, ContainerProperties, - ContainerReference, CosmosOperation, DatabaseReference, ItemReference, PartitionKey, - PartitionKeyDefinition, SystemProperties, ThroughputControlGroupName, + ContainerReference, CosmosOperation, DatabaseReference, DefaultConsistencyLevel, + ItemReference, PartitionKey, PartitionKeyDefinition, SystemProperties, + ThroughputControlGroupName, }, options::{PriorityLevel, ThroughputControlGroupSnapshot}, }; @@ -709,6 +726,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -730,6 +748,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -751,6 +770,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -777,6 +797,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Retry, deadline: Some(std::time::Instant::now() + Duration::from_secs(5)), + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -813,6 +834,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -844,6 +866,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -1420,6 +1443,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -1445,6 +1469,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: None, }; @@ -1483,6 +1508,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: Some(&snapshot), }; @@ -1526,6 +1552,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: Some(&snapshot), }; @@ -1570,6 +1597,7 @@ mod tests { activity_id: &activity_id, execution_context: ExecutionContext::Initial, deadline: None, + effective_consistency: DefaultConsistencyLevel::Session, resolved_session_token: None, throughput_control: Some(&snapshot), }; diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs index 6f9eeab5d2d..80b9553d3fd 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs @@ -15,7 +15,8 @@ const SDK_SUPPORTED_CAPABILITIES: HeaderName = HeaderName::from_static("x-ms-cosmos-sdk-supportedcapabilities"); const PARTITION_MERGE_BIT: u32 = 1; const IGNORE_UNKNOWN_RNTBD_TOKENS_BIT: u32 = 8; -const SUPPORTED_CAPABILITIES_BITS: u32 = PARTITION_MERGE_BIT | IGNORE_UNKNOWN_RNTBD_TOKENS_BIT; +pub(crate) const SUPPORTED_CAPABILITIES_BITS: u32 = + PARTITION_MERGE_BIT | IGNORE_UNKNOWN_RNTBD_TOKENS_BIT; const _: () = assert!(SUPPORTED_CAPABILITIES_BITS == 9); /// String-encoded SDK capabilities bitmask. /// diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs new file mode 100644 index 00000000000..9f9d963ddef --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs @@ -0,0 +1,864 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! Gateway 2.0 HTTP dispatch helpers. + +use std::sync::atomic::{AtomicU32, Ordering}; + +use azure_core::{ + error::ErrorKind, + http::{ + headers::{HeaderName, HeaderValue, Headers, AUTHORIZATION, USER_AGENT}, + Method, + }, +}; +use uuid::Uuid; + +use crate::models::{ + cosmos_headers::response_header_names, effective_partition_key::EffectivePartitionKey, + DefaultConsistencyLevel, OperationType, PartitionKey, PartitionKeyDefinition, ResourceType, +}; + +use super::{ + cosmos_headers::SUPPORTED_CAPABILITIES_BITS, + cosmos_transport_client::{HttpRequest, HttpResponse}, + rntbd::{RntbdRequestFrame, RntbdResponse, Token}, + AuthorizationContext, +}; + +const X_MS_ACTIVITY_ID: HeaderName = HeaderName::from_static("x-ms-activity-id"); +const X_MS_DATE: HeaderName = HeaderName::from_static("x-ms-date"); +const X_MS_LSN: HeaderName = HeaderName::from_static("x-ms-lsn"); +const X_MS_GLOBAL_COMMITTED_LSN: HeaderName = HeaderName::from_static("x-ms-global-committed-lsn"); +static TRANSPORT_REQUEST_ID: AtomicU32 = AtomicU32::new(0); + +/// Inputs resolved by the operation pipeline before a Gateway 2.0 dispatch. +pub(crate) struct WrapInputs<'a> { + pub(crate) auth_context: &'a AuthorizationContext, + pub(crate) operation_type: OperationType, + pub(crate) resource_type: ResourceType, + pub(crate) partition_key: Option<&'a PartitionKey>, + pub(crate) partition_key_definition: Option<&'a PartitionKeyDefinition>, + pub(crate) effective_consistency: DefaultConsistencyLevel, + pub(crate) account_name: Option<&'a str>, +} + +/// Wraps a signed Cosmos HTTP request into a Gateway 2.0 RNTBD request frame. +pub(crate) fn wrap_request_for_gateway20( + request: &HttpRequest, + inputs: &WrapInputs<'_>, +) -> azure_core::Result { + let authorization = required_header(request, &AUTHORIZATION, "authorization")?; + let date = required_header(request, &X_MS_DATE, "x-ms-date")?; + let activity_id = required_header(request, &X_MS_ACTIVITY_ID, "x-ms-activity-id")?; + let activity_id = Uuid::parse_str(&activity_id) + .map_err(|e| data_conversion_error(format!("x-ms-activity-id is not a valid UUID: {e}")))?; + let account_name = inputs + .account_name + .filter(|value| !value.is_empty()) + .ok_or_else(|| data_conversion_error("Gateway 2.0 dispatch requires an account name"))?; + + let resource_names = parse_resource_names(inputs.auth_context.resource_link.as_str())?; + let has_payload = request.body.as_ref().is_some_and(|body| !body.is_empty()); + + let mut metadata = Vec::with_capacity(11); + if let Some(epk) = effective_partition_key_bytes(inputs)? { + metadata.push(Token::effective_partition_key(epk)); + } + metadata.push(Token::global_database_account_name(account_name.to_owned())); + metadata.push(Token::database_name(resource_names.database)); + metadata.push(Token::collection_name(resource_names.collection)); + metadata.push(Token::payload_present(has_payload)); + if inputs.resource_type == ResourceType::Document + && inputs.operation_type != OperationType::Create + { + if let Some(document) = resource_names.document { + metadata.push(Token::document_name(document)); + } + } + metadata.push(Token::authorization_token(authorization)); + metadata.push(Token::date(date)); + metadata.push(Token::consistency_level(inputs.effective_consistency)); + metadata.push(Token::transport_request_id(next_transport_request_id())); + metadata.push(Token::sdk_supported_capabilities( + SUPPORTED_CAPABILITIES_BITS, + )); + + let frame = RntbdRequestFrame { + resource_type: inputs.resource_type, + operation_type: inputs.operation_type, + activity_id, + metadata, + body: if has_payload { + request.body.as_ref().map(|body| body.to_vec()) + } else { + None + }, + } + .serialize()?; + + let mut headers = Headers::new(); + if let Some(user_agent) = request.headers.get_optional_str(&USER_AGENT) { + headers.insert(USER_AGENT, HeaderValue::from(user_agent.to_owned())); + } + headers.insert(X_MS_ACTIVITY_ID, HeaderValue::from(activity_id.to_string())); + + Ok(HttpRequest { + url: request.url.clone(), + method: Method::Post, + headers, + body: Some(bytes::Bytes::from(frame)), + timeout: request.timeout, + #[cfg(feature = "fault_injection")] + evaluation_collector: request.evaluation_collector.clone(), + }) +} + +/// Decodes a Gateway 2.0 RNTBD response body into a synthetic HTTP response. +pub(crate) fn unwrap_response_for_gateway20( + response: HttpResponse, +) -> azure_core::Result { + let response = RntbdResponse::deserialize(&response.body)?; + let status = u16::from(response.status.status_code()); + if !(100..=599).contains(&status) { + return Err(data_conversion_error(format!( + "Gateway 2.0 RNTBD response contained invalid HTTP status {status}" + ))); + } + + let mut headers = Headers::new(); + headers.insert( + response_header_names::ACTIVITY_ID, + response.activity_id.to_string(), + ); + if let Some(charge) = response.request_charge { + headers.insert(response_header_names::REQUEST_CHARGE, charge.to_string()); + } + if let Some(token) = response.session_token { + headers.insert(response_header_names::SESSION_TOKEN, token); + } + if let Some(etag) = response.etag { + headers.insert(response_header_names::ETAG, etag); + } + if let Some(continuation) = response.continuation_token { + headers.insert(response_header_names::CONTINUATION, continuation); + } + if let Some(substatus) = response.status.sub_status() { + headers.insert( + response_header_names::SUBSTATUS, + substatus.value().to_string(), + ); + } + if let Some(retry_after_ms) = response.retry_after_ms { + headers.insert("x-ms-retry-after-ms", retry_after_ms.to_string()); + } + if let Some(lsn) = response.lsn.filter(|value| *value != 0) { + let value = lsn.to_string(); + headers.insert(response_header_names::LSN, value.clone()); + headers.insert(X_MS_LSN, value); + } + if let Some(item_lsn) = response.item_lsn.filter(|value| *value != 0) { + headers.insert(response_header_names::ITEM_LSN, item_lsn.to_string()); + } + if let Some(global_committed_lsn) = response.global_committed_lsn.filter(|value| *value != 0) { + headers.insert(X_MS_GLOBAL_COMMITTED_LSN, global_committed_lsn.to_string()); + } + if let Some(owner_full_name) = response.owner_full_name { + headers.insert(response_header_names::OWNER_FULL_NAME, owner_full_name); + } + + Ok(HttpResponse { + status, + headers, + body: response.body, + }) +} + +fn required_header( + request: &HttpRequest, + header_name: &HeaderName, + display_name: &'static str, +) -> azure_core::Result { + request + .headers + .get_optional_str(header_name) + .map(str::to_owned) + .ok_or_else(|| data_conversion_error(format!("missing required {display_name} header"))) +} + +fn next_transport_request_id() -> u32 { + TRANSPORT_REQUEST_ID.fetch_add(1, Ordering::Relaxed) +} + +fn effective_partition_key_bytes(inputs: &WrapInputs<'_>) -> azure_core::Result>> { + match (inputs.partition_key, inputs.partition_key_definition) { + (Some(partition_key), Some(partition_key_definition)) => { + let epk = EffectivePartitionKey::compute( + partition_key.values(), + partition_key_definition.kind(), + partition_key_definition.version(), + ); + hex_to_bytes(epk.as_str()).map(Some) + } + _ => Ok(None), + } +} + +fn hex_to_bytes(value: &str) -> azure_core::Result> { + if value.len() & 1 != 0 { + return Err(data_conversion_error(format!( + "effective partition key hex length {} is not even", + value.len() + ))); + } + + let mut bytes = Vec::with_capacity(value.len() / 2); + for chunk in value.as_bytes().chunks_exact(2) { + let hi = hex_digit(chunk[0])?; + let lo = hex_digit(chunk[1])?; + bytes.push((hi << 4) | lo); + } + Ok(bytes) +} + +fn hex_digit(value: u8) -> azure_core::Result { + match value { + b'0'..=b'9' => Ok(value - b'0'), + b'a'..=b'f' => Ok(value - b'a' + 10), + b'A'..=b'F' => Ok(value - b'A' + 10), + _ => Err(data_conversion_error(format!( + "invalid effective partition key hex digit 0x{value:02X}" + ))), + } +} + +struct ResourceNames { + database: String, + collection: String, + document: Option, +} + +fn parse_resource_names(resource_link: &str) -> azure_core::Result { + let mut database = None; + let mut collection = None; + let mut document = None; + let mut segments = resource_link + .trim_matches('/') + .split('/') + .filter(|segment| !segment.is_empty()); + + while let Some(kind) = segments.next() { + let Some(name) = segments.next() else { + break; + }; + match kind { + "dbs" => database = Some(name.to_owned()), + "colls" => collection = Some(name.to_owned()), + "docs" => document = Some(name.to_owned()), + _ => {} + } + } + + let database = database.filter(|value| !value.is_empty()).ok_or_else(|| { + data_conversion_error("Gateway 2.0 resource link is missing database name") + })?; + let collection = collection + .filter(|value| !value.is_empty()) + .ok_or_else(|| { + data_conversion_error("Gateway 2.0 resource link is missing collection name") + })?; + + Ok(ResourceNames { + database, + collection, + document, + }) +} + +fn data_conversion_error(message: impl Into) -> azure_core::Error { + azure_core::Error::with_message(ErrorKind::DataConversion, message.into()) +} + +#[cfg(test)] +mod tests { + use std::{borrow::Cow, collections::HashMap}; + + use azure_core::http::headers::{ACCEPT, CONTENT_TYPE}; + + use super::*; + use crate::models::{PartitionKeyKind, PartitionKeyVersion}; + + const ACTIVITY_ID: &str = "00112233-4455-6677-8899-aabbccddeeff"; + + #[derive(Clone, Debug, PartialEq)] + enum ParsedTokenValue { + Byte(u8), + ULong(u32), + LongLong(i64), + Double(f64), + SmallString(String), + String(String), + Bytes(Vec), + } + + #[derive(Debug)] + struct ParsedRequest { + resource_type: u16, + operation_type: u16, + activity_id: Uuid, + tokens: HashMap, + body: Option>, + } + + fn signed_request(body: Option<&[u8]>) -> HttpRequest { + let mut headers = Headers::new(); + headers.insert(AUTHORIZATION, "auth-token"); + headers.insert(X_MS_DATE, "Wed, 21 Oct 2015 07:28:00 GMT"); + headers.insert(X_MS_ACTIVITY_ID, ACTIVITY_ID); + headers.insert(USER_AGENT, "test-agent"); + headers.insert(CONTENT_TYPE, "application/json"); + headers.insert(ACCEPT, "application/json"); + + HttpRequest { + url: "https://account-thin.documents.azure.com:444/dbs/db1/colls/coll1/docs/doc1" + .parse() + .unwrap(), + method: Method::Get, + headers, + body: body.map(bytes::Bytes::copy_from_slice), + timeout: None, + #[cfg(feature = "fault_injection")] + evaluation_collector: None, + } + } + + fn wrap_inputs<'a>( + auth_context: &'a AuthorizationContext, + operation_type: OperationType, + partition_key: Option<&'a PartitionKey>, + partition_key_definition: Option<&'a PartitionKeyDefinition>, + ) -> WrapInputs<'a> { + WrapInputs { + auth_context, + operation_type, + resource_type: ResourceType::Document, + partition_key, + partition_key_definition, + effective_consistency: DefaultConsistencyLevel::Session, + account_name: Some("account"), + } + } + + fn parse_wrapped_request(request: &HttpRequest, token_count: usize) -> ParsedRequest { + let mut src = request.body.as_ref().unwrap().as_ref(); + let total_len = take_u32(&mut src) as usize; + assert_eq!(total_len, request.body.as_ref().unwrap().len()); + let resource_type = take_u16(&mut src); + let operation_type = take_u16(&mut src); + let activity_id = take_uuid(&mut src); + + let mut tokens = HashMap::new(); + for _ in 0..token_count { + let id = take_u16(&mut src); + let token_type = take_u8(&mut src); + let value = parse_token_value(token_type, &mut src); + tokens.insert(id, value); + } + + let body = if src.is_empty() { + None + } else { + let body_len = take_u32(&mut src) as usize; + assert_eq!(src.len(), body_len); + Some(src.to_vec()) + }; + + ParsedRequest { + resource_type, + operation_type, + activity_id, + tokens, + body, + } + } + + fn parse_token_value(token_type: u8, src: &mut &[u8]) -> ParsedTokenValue { + match token_type { + 0x00 => ParsedTokenValue::Byte(take_u8(src)), + 0x02 => ParsedTokenValue::ULong(take_u32(src)), + 0x05 => ParsedTokenValue::LongLong(take_i64(src)), + 0x07 => { + let len = take_u8(src) as usize; + ParsedTokenValue::SmallString(take_string(src, len)) + } + 0x08 => { + let len = take_u16(src) as usize; + ParsedTokenValue::String(take_string(src, len)) + } + 0x0B => { + let len = take_u16(src) as usize; + ParsedTokenValue::Bytes(take_bytes(src, len).to_vec()) + } + 0x0E => ParsedTokenValue::Double(f64::from_le_bytes(take_array(src))), + other => panic!("unexpected token type 0x{other:02X}"), + } + } + + #[test] + fn wrap_builds_required_request_tokens_for_read() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Read, None, None), + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 10); + + assert_eq!(wrapped.method, Method::Post); + assert_eq!(parsed.resource_type, 0x0003); + assert_eq!(parsed.operation_type, 0x0003); + assert_eq!(parsed.activity_id, Uuid::parse_str(ACTIVITY_ID).unwrap()); + assert_eq!( + parsed.tokens[&0x0001], + ParsedTokenValue::String("auth-token".into()) + ); + assert_eq!(parsed.tokens[&0x0002], ParsedTokenValue::Byte(0)); + assert_eq!( + parsed.tokens[&0x0003], + ParsedTokenValue::SmallString("Wed, 21 Oct 2015 07:28:00 GMT".into()) + ); + assert_eq!(parsed.tokens[&0x0010], ParsedTokenValue::Byte(0x02)); + assert_eq!( + parsed.tokens[&0x0015], + ParsedTokenValue::String("db1".into()) + ); + assert_eq!( + parsed.tokens[&0x0016], + ParsedTokenValue::String("coll1".into()) + ); + assert_eq!( + parsed.tokens[&0x0017], + ParsedTokenValue::String("doc1".into()) + ); + assert_eq!( + parsed.tokens[&0x004D], + ParsedTokenValue::ULong(parsed_transport_id(&parsed)) + ); + assert_eq!( + parsed.tokens[&0x00A2], + ParsedTokenValue::ULong(SUPPORTED_CAPABILITIES_BITS) + ); + assert_eq!( + parsed.tokens[&0x00CE], + ParsedTokenValue::String("account".into()) + ); + } + + #[test] + fn wrap_preserves_payload_and_sets_payload_present() { + let request = signed_request(Some(br#"{"id":"doc1"}"#)); + let auth_context = + AuthorizationContext::new(Method::Post, ResourceType::Document, "dbs/db1/colls/coll1"); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Create, None, None), + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 9); + + assert_eq!(parsed.tokens[&0x0002], ParsedTokenValue::Byte(1)); + assert_eq!(parsed.body, Some(br#"{"id":"doc1"}"#.to_vec())); + } + + #[test] + fn wrap_omits_document_name_for_create() { + let request = signed_request(Some(b"{}")); + let auth_context = + AuthorizationContext::new(Method::Post, ResourceType::Document, "dbs/db1/colls/coll1"); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Create, None, None), + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 9); + + assert!(!parsed.tokens.contains_key(&0x0017)); + } + + #[test] + fn wrap_uses_resolved_consistency_token() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + let mut inputs = wrap_inputs(&auth_context, OperationType::Read, None, None); + inputs.effective_consistency = DefaultConsistencyLevel::Eventual; + + let wrapped = wrap_request_for_gateway20(&request, &inputs).unwrap(); + let parsed = parse_wrapped_request(&wrapped, 10); + + assert_eq!(parsed.tokens[&0x0010], ParsedTokenValue::Byte(0x03)); + } + + #[test] + fn wrap_computes_effective_partition_key_bytes() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + let partition_key = PartitionKey::from("tenant1"); + let partition_key_definition = PartitionKeyDefinition::new(vec![Cow::from("/tenantId")]); + let expected = hex_to_bytes( + EffectivePartitionKey::compute( + partition_key.values(), + PartitionKeyKind::Hash, + PartitionKeyVersion::V2, + ) + .as_str(), + ) + .unwrap(); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs( + &auth_context, + OperationType::Read, + Some(&partition_key), + Some(&partition_key_definition), + ), + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 11); + + assert_eq!(parsed.tokens[&0x005A], ParsedTokenValue::Bytes(expected)); + } + + #[test] + fn wrap_only_keeps_user_agent_and_activity_id_headers() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Read, None, None), + ) + .unwrap(); + + assert_eq!( + wrapped.headers.get_optional_str(&USER_AGENT), + Some("test-agent") + ); + assert_eq!( + wrapped.headers.get_optional_str(&X_MS_ACTIVITY_ID), + Some(ACTIVITY_ID) + ); + assert!(wrapped.headers.get_optional_str(&AUTHORIZATION).is_none()); + assert!(wrapped.headers.get_optional_str(&X_MS_DATE).is_none()); + assert!(wrapped.headers.get_optional_str(&CONTENT_TYPE).is_none()); + assert!(wrapped.headers.get_optional_str(&ACCEPT).is_none()); + } + + #[test] + fn wrap_rejects_missing_authorization_header() { + let mut request = signed_request(None); + request.headers.remove(AUTHORIZATION); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + + let error = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Read, None, None), + ) + .unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + + #[test] + fn wrap_rejects_missing_date_header() { + let mut request = signed_request(None); + request.headers.remove(X_MS_DATE); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + + let error = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Read, None, None), + ) + .unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + + #[test] + fn wrap_rejects_invalid_activity_id() { + let mut request = signed_request(None); + request.headers.insert(X_MS_ACTIVITY_ID, "not-a-guid"); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + + let error = wrap_request_for_gateway20( + &request, + &wrap_inputs(&auth_context, OperationType::Read, None, None), + ) + .unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + + #[test] + fn unwrap_maps_response_status_headers_and_body() { + let activity_id = Uuid::parse_str(ACTIVITY_ID).unwrap(); + let response = HttpResponse { + status: 200, + headers: Headers::new(), + body: response_frame( + 404, + activity_id, + |tokens| { + write_u32_token(tokens, 0x001C, 1002); + write_double_token(tokens, 0x0015, 3.5); + write_string_token(tokens, 0x003E, "1:2#3"); + write_string_token(tokens, 0x0004, "\"etag\""); + write_string_token(tokens, 0x0003, "continuation"); + write_i64_token(tokens, 0x0013, 42); + write_i64_token(tokens, 0x0032, 43); + write_i64_token(tokens, 0x0029, 44); + write_string_token(tokens, 0x0017, "dbs/db1/colls/coll1/docs/doc1"); + }, + b"{}", + ), + }; + + let unwrapped = unwrap_response_for_gateway20(response).unwrap(); + + assert_eq!(unwrapped.status, 404); + assert_eq!(unwrapped.body, b"{}".to_vec()); + assert_eq!( + unwrapped.headers.get_optional_str(&X_MS_ACTIVITY_ID), + Some(ACTIVITY_ID) + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-substatus")), + Some("1002") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-request-charge")), + Some("3.5") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-session-token")), + Some("1:2#3") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("etag")), + Some("\"etag\"") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-continuation")), + Some("continuation") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("lsn")), + Some("42") + ); + assert_eq!(unwrapped.headers.get_optional_str(&X_MS_LSN), Some("42")); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-item-lsn")), + Some("43") + ); + assert_eq!( + unwrapped + .headers + .get_optional_str(&X_MS_GLOBAL_COMMITTED_LSN), + Some("44") + ); + } + + #[test] + fn unwrap_preserves_retry_after_for_throttle() { + let response = HttpResponse { + status: 200, + headers: Headers::new(), + body: response_frame( + 429, + Uuid::parse_str(ACTIVITY_ID).unwrap(), + |tokens| write_u32_token(tokens, 0x000C, 125), + b"", + ), + }; + + let unwrapped = unwrap_response_for_gateway20(response).unwrap(); + + assert_eq!(unwrapped.status, 429); + assert_eq!( + unwrapped + .headers + .get_optional_str(&HeaderName::from_static("x-ms-retry-after-ms")), + Some("125") + ); + } + + #[test] + fn unwrap_rejects_malformed_rntbd_body() { + let response = HttpResponse { + status: 200, + headers: Headers::new(), + body: vec![1, 2, 3], + }; + + let error = unwrap_response_for_gateway20(response).unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + + #[test] + fn unwrap_rejects_out_of_range_inner_status() { + let response = HttpResponse { + status: 200, + headers: Headers::new(), + body: response_frame(70_000, Uuid::parse_str(ACTIVITY_ID).unwrap(), |_| {}, b""), + }; + + let error = unwrap_response_for_gateway20(response).unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + + fn parsed_transport_id(parsed: &ParsedRequest) -> u32 { + match parsed.tokens[&0x004D] { + ParsedTokenValue::ULong(value) => value, + _ => unreachable!(), + } + } + + fn response_frame( + status: u32, + activity_id: Uuid, + write_tokens: impl FnOnce(&mut Vec), + body: &[u8], + ) -> Vec { + let mut bytes = Vec::new(); + bytes.extend_from_slice(&0_u32.to_le_bytes()); + bytes.extend_from_slice(&status.to_le_bytes()); + write_uuid(&mut bytes, activity_id); + write_tokens(&mut bytes); + bytes.extend_from_slice(body); + let total_len = u32::try_from(bytes.len()).unwrap(); + bytes[0..4].copy_from_slice(&total_len.to_le_bytes()); + bytes + } + + fn write_string_token(bytes: &mut Vec, id: u16, value: &str) { + bytes.extend_from_slice(&id.to_le_bytes()); + bytes.push(0x08); + bytes.extend_from_slice(&(value.len() as u16).to_le_bytes()); + bytes.extend_from_slice(value.as_bytes()); + } + + fn write_u32_token(bytes: &mut Vec, id: u16, value: u32) { + bytes.extend_from_slice(&id.to_le_bytes()); + bytes.push(0x02); + bytes.extend_from_slice(&value.to_le_bytes()); + } + + fn write_i64_token(bytes: &mut Vec, id: u16, value: i64) { + bytes.extend_from_slice(&id.to_le_bytes()); + bytes.push(0x05); + bytes.extend_from_slice(&value.to_le_bytes()); + } + + fn write_double_token(bytes: &mut Vec, id: u16, value: f64) { + bytes.extend_from_slice(&id.to_le_bytes()); + bytes.push(0x0E); + bytes.extend_from_slice(&value.to_le_bytes()); + } + + fn write_uuid(bytes: &mut Vec, value: Uuid) { + let value = value.as_u128(); + let msb = (value >> 64) as u64; + let lsb = value as u64; + bytes.extend_from_slice(&msb.to_le_bytes()); + bytes.extend_from_slice(&lsb.to_le_bytes()); + } + + fn take_u8(src: &mut &[u8]) -> u8 { + let value = src[0]; + *src = &src[1..]; + value + } + + fn take_u16(src: &mut &[u8]) -> u16 { + u16::from_le_bytes(take_array(src)) + } + + fn take_u32(src: &mut &[u8]) -> u32 { + u32::from_le_bytes(take_array(src)) + } + + fn take_i64(src: &mut &[u8]) -> i64 { + i64::from_le_bytes(take_array(src)) + } + + fn take_uuid(src: &mut &[u8]) -> Uuid { + let msb = u64::from_le_bytes(take_array(src)); + let lsb = u64::from_le_bytes(take_array(src)); + Uuid::from_u128(((msb as u128) << 64) | lsb as u128) + } + + fn take_string(src: &mut &[u8], len: usize) -> String { + String::from_utf8(take_bytes(src, len).to_vec()).unwrap() + } + + fn take_bytes<'a>(src: &mut &'a [u8], len: usize) -> &'a [u8] { + let (head, tail) = src.split_at(len); + *src = tail; + head + } + + fn take_array(src: &mut &[u8]) -> [u8; N] { + let bytes = take_bytes(src, N); + let mut out = [0; N]; + out.copy_from_slice(bytes); + out + } +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs index 991fcde35f0..4d158261b26 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs @@ -20,6 +20,7 @@ pub(crate) mod background_task_manager; pub(crate) mod cosmos_headers; pub(crate) mod cosmos_transport_client; mod emulator; +mod gateway20_dispatch; /// Gateway 2.0 operation eligibility filter. pub(crate) mod gateway20_eligibility; pub(crate) use gateway20_eligibility::is_operation_supported_by_gateway20; @@ -52,6 +53,9 @@ use self::http_client_factory::DefaultHttpClientFactory; pub(crate) use authorization_policy::generate_authorization; pub(crate) use authorization_policy::AuthorizationContext; pub(crate) use emulator::is_emulator_host; +pub(crate) use gateway20_dispatch::{ + unwrap_response_for_gateway20, wrap_request_for_gateway20, WrapInputs, +}; pub(crate) use tracked_transport::infer_request_sent_status; /// Cosmos DB REST API version. diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs index 62a3303baee..7311ee3f83c 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs @@ -6,7 +6,7 @@ use azure_core::error::ErrorKind; use uuid::Uuid; -use crate::models::{OperationType, ResourceType}; +use crate::models::{DefaultConsistencyLevel, OperationType, ResourceType}; /// The token type byte used by RNTBD metadata tokens. /// @@ -257,6 +257,90 @@ impl Token { Self { id, value } } + pub(crate) fn authorization_token(value: String) -> Self { + Self::new( + RntbdRequestToken::AuthorizationToken.into(), + TokenValue::String(value), + ) + } + + pub(crate) fn payload_present(value: bool) -> Self { + Self::new( + RntbdRequestToken::PayloadPresent.into(), + TokenValue::Byte(u8::from(value)), + ) + } + + pub(crate) fn date(value: String) -> Self { + Self::new( + RntbdRequestToken::Date.into(), + TokenValue::SmallString(value), + ) + } + + pub(crate) fn consistency_level(value: DefaultConsistencyLevel) -> Self { + let value = match value { + DefaultConsistencyLevel::Strong => 0x00, + DefaultConsistencyLevel::BoundedStaleness => 0x01, + DefaultConsistencyLevel::Session => 0x02, + DefaultConsistencyLevel::Eventual => 0x03, + DefaultConsistencyLevel::ConsistentPrefix => 0x04, + }; + Self::new( + RntbdRequestToken::ConsistencyLevel.into(), + TokenValue::Byte(value), + ) + } + + pub(crate) fn database_name(value: String) -> Self { + Self::new( + RntbdRequestToken::DatabaseName.into(), + TokenValue::String(value), + ) + } + + pub(crate) fn collection_name(value: String) -> Self { + Self::new( + RntbdRequestToken::CollectionName.into(), + TokenValue::String(value), + ) + } + + pub(crate) fn document_name(value: String) -> Self { + Self::new( + RntbdRequestToken::DocumentName.into(), + TokenValue::String(value), + ) + } + + pub(crate) fn transport_request_id(value: u32) -> Self { + Self::new( + RntbdRequestToken::TransportRequestId.into(), + TokenValue::ULong(value), + ) + } + + pub(crate) fn effective_partition_key(value: Vec) -> Self { + Self::new( + RntbdRequestToken::EffectivePartitionKey.into(), + TokenValue::Bytes(value), + ) + } + + pub(crate) fn sdk_supported_capabilities(value: u32) -> Self { + Self::new( + RntbdRequestToken::SDKSupportedCapabilities.into(), + TokenValue::ULong(value), + ) + } + + pub(crate) fn global_database_account_name(value: String) -> Self { + Self::new( + RntbdRequestToken::GlobalDatabaseAccountName.into(), + TokenValue::String(value), + ) + } + /// Returns the number of bytes this token occupies on the wire. pub(super) fn encoded_len(&self) -> usize { 2 + 1 + self.value.encoded_len() @@ -281,6 +365,61 @@ impl Token { } } +/// RNTBD request metadata token IDs used by Gateway 2.0 dispatch. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(crate) enum RntbdRequestToken { + AuthorizationToken, + PayloadPresent, + Date, + ConsistencyLevel, + DatabaseName, + CollectionName, + DocumentName, + TransportRequestId, + EffectivePartitionKey, + SDKSupportedCapabilities, + GlobalDatabaseAccountName, +} + +impl TryFrom for RntbdRequestToken { + type Error = (); + + fn try_from(value: u16) -> Result { + match value { + 0x0001 => Ok(Self::AuthorizationToken), + 0x0002 => Ok(Self::PayloadPresent), + 0x0003 => Ok(Self::Date), + 0x0010 => Ok(Self::ConsistencyLevel), + 0x0015 => Ok(Self::DatabaseName), + 0x0016 => Ok(Self::CollectionName), + 0x0017 => Ok(Self::DocumentName), + 0x004D => Ok(Self::TransportRequestId), + 0x005A => Ok(Self::EffectivePartitionKey), + 0x00A2 => Ok(Self::SDKSupportedCapabilities), + 0x00CE => Ok(Self::GlobalDatabaseAccountName), + _ => Err(()), + } + } +} + +impl From for u16 { + fn from(value: RntbdRequestToken) -> Self { + match value { + RntbdRequestToken::AuthorizationToken => 0x0001, + RntbdRequestToken::PayloadPresent => 0x0002, + RntbdRequestToken::Date => 0x0003, + RntbdRequestToken::ConsistencyLevel => 0x0010, + RntbdRequestToken::DatabaseName => 0x0015, + RntbdRequestToken::CollectionName => 0x0016, + RntbdRequestToken::DocumentName => 0x0017, + RntbdRequestToken::TransportRequestId => 0x004D, + RntbdRequestToken::EffectivePartitionKey => 0x005A, + RntbdRequestToken::SDKSupportedCapabilities => 0x00A2, + RntbdRequestToken::GlobalDatabaseAccountName => 0x00CE, + } + } +} + /// RNTBD response metadata token IDs recognized by Slice 1. pub(super) enum RntbdResponseToken { /// Continuation token. diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/transport_pipeline.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/transport_pipeline.rs index ff0cdde4470..fe69e21ba8d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/transport_pipeline.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/transport_pipeline.rs @@ -31,11 +31,13 @@ use crate::{ use super::{ adaptive_transport::AdaptiveTransport, cosmos_headers::apply_cosmos_headers, cosmos_transport_client::HttpRequest, infer_request_sent_status, request_signing::sign_request, - sharded_transport::EndpointKey, + sharded_transport::EndpointKey, unwrap_response_for_gateway20, wrap_request_for_gateway20, + WrapInputs, }; use crate::driver::pipeline::components::{ - ThrottleAction, ThrottleRetryState, TransportOutcome, TransportRequest, TransportResult, + ThrottleAction, ThrottleRetryState, TransportMode, TransportOutcome, TransportRequest, + TransportResult, }; /// Cosmos DB retry-after header (milliseconds). @@ -153,6 +155,8 @@ pub(crate) struct TransportPipelineContext<'a> { /// Computed once by the operation pipeline from the routing-level endpoint /// so the transport pipeline doesn't need to allocate a `String` per attempt. pub endpoint_key: EndpointKey, + /// Global database account name used by Gateway 2.0 request wrapping. + pub account_name: Option, } /// Executes a single transport attempt. @@ -251,6 +255,25 @@ pub(crate) async fn execute_transport_pipeline( }; } + let should_unwrap_gateway20 = request.transport_mode == TransportMode::Gateway20; + if should_unwrap_gateway20 { + let wrap_inputs = WrapInputs { + auth_context: &request.auth_context, + operation_type: request.operation_type, + resource_type: request.auth_context.resource_type, + partition_key: request.partition_key.as_ref(), + partition_key_definition: request.partition_key_definition.as_ref(), + effective_consistency: request.effective_consistency, + account_name: ctx.account_name.as_deref(), + }; + match wrap_request_for_gateway20(&http_request, &wrap_inputs) { + Ok(wrapped_request) => http_request = wrapped_request, + Err(e) => { + return gateway20_wrap_error_result(e, request_handle, diagnostics); + } + } + } + // Record transport start event diagnostics.add_event( request_handle, @@ -274,6 +297,7 @@ pub(crate) async fn execute_transport_pipeline( diagnostics, excluded_shard_id.take(), endpoint_key, + should_unwrap_gateway20, ) .await; @@ -369,6 +393,7 @@ fn deadline_exceeded_result(request_sent: RequestSentStatus) -> TransportResult TransportResult::deadline_exceeded(request_sent) } +#[allow(clippy::too_many_arguments)] async fn execute_http_attempt( http_request: &HttpRequest, transport: &AdaptiveTransport, @@ -377,6 +402,7 @@ async fn execute_http_attempt( diagnostics: &mut DiagnosticsContextBuilder, excluded_shard_id: Option, endpoint_key: &EndpointKey, + should_unwrap_gateway20: bool, ) -> ExecutedTransportAttempt { if let Some(timeout_duration) = per_request_timeout { // Pre-select the shard so we know which shard the request was dispatched @@ -405,9 +431,12 @@ async fn execute_http_attempt( pin_mut!(timeout_future); return match futures::future::select(transport_future, timeout_future).await { - Either::Left((attempt_result, _)) => { - finalize_http_attempt(attempt_result, request_handle, diagnostics) - } + Either::Left((attempt_result, _)) => finalize_http_attempt( + attempt_result, + request_handle, + diagnostics, + should_unwrap_gateway20, + ), Either::Right((_, _remaining_transport_future)) => { diagnostics.add_event( request_handle, @@ -432,7 +461,12 @@ async fn execute_http_attempt( None, ) .await; - finalize_http_attempt(attempt_result, request_handle, diagnostics) + finalize_http_attempt( + attempt_result, + request_handle, + diagnostics, + should_unwrap_gateway20, + ) } async fn execute_http_attempt_future( @@ -472,6 +506,7 @@ fn finalize_http_attempt( attempt_result: HttpAttemptResult, request_handle: RequestHandle, diagnostics: &mut DiagnosticsContextBuilder, + should_unwrap_gateway20: bool, ) -> ExecutedTransportAttempt { match attempt_result { HttpAttemptResult::Response { @@ -488,6 +523,36 @@ fn finalize_http_attempt( if let Some(shard_diagnostics) = shard_diagnostics.clone() { diagnostics.set_transport_shard(request_handle, shard_diagnostics); } + + let (status_code, headers, body) = if should_unwrap_gateway20 + && status_code == azure_core::http::StatusCode::Ok + { + match unwrap_response_for_gateway20(super::cosmos_transport_client::HttpResponse { + status: u16::from(status_code), + headers, + body, + }) { + Ok(response) => ( + azure_core::http::StatusCode::from(response.status), + response.headers, + response.body, + ), + Err(error) => { + return ExecutedTransportAttempt { + result: gateway20_unwrap_error_result( + error, + request_handle, + diagnostics, + ), + shard_id, + shard_diagnostics, + }; + } + } + } else { + (status_code, headers, body) + }; + ExecutedTransportAttempt { result: map_http_response_payload( status_code, @@ -548,6 +613,56 @@ fn format_transport_error_details(error: &azure_core::Error) -> String { crate::driver::error_chain_summary(error) } +fn gateway20_wrap_error_result( + error: azure_core::Error, + request_handle: RequestHandle, + diagnostics: &mut DiagnosticsContextBuilder, +) -> TransportResult { + let status = CosmosStatus::CLIENT_GENERATED_400; + let error_details = format_transport_error_details(&error); + diagnostics.fail_transport_request( + request_handle, + error_details, + RequestSentStatus::NotSent, + status, + ); + + TransportResult { + outcome: TransportOutcome::TransportError { + status, + error, + request_sent: RequestSentStatus::NotSent, + }, + } +} + +fn gateway20_unwrap_error_result( + error: azure_core::Error, + request_handle: RequestHandle, + diagnostics: &mut DiagnosticsContextBuilder, +) -> TransportResult { + let status = CosmosStatus::TRANSPORT_GENERATED_503; + let error_details = format_transport_error_details(&error); + diagnostics.add_event( + request_handle, + RequestEvent::new(RequestEventType::TransportFailed).with_details(error_details.clone()), + ); + diagnostics.fail_transport_request( + request_handle, + error_details, + RequestSentStatus::Sent, + status, + ); + + TransportResult { + outcome: TransportOutcome::TransportError { + status, + error, + request_sent: RequestSentStatus::Sent, + }, + } +} + fn transport_error_result( error: azure_core::Error, headers_received: bool, @@ -660,6 +775,7 @@ fn map_http_response_payload( mod tests { use super::*; use std::{ + collections::VecDeque, sync::{Arc, Mutex}, time::Duration, }; @@ -667,7 +783,7 @@ mod tests { use async_trait::async_trait; use crate::{ - diagnostics::DiagnosticsContextBuilder, + diagnostics::{DiagnosticsContextBuilder, RequestSentStatus}, driver::{ routing::CosmosEndpoint, transport::{ @@ -678,7 +794,7 @@ mod tests { http_client_factory::{HttpClientConfig, HttpClientFactory}, }, }, - models::{ActivityId, Credential, ResourceType}, + models::{ActivityId, Credential, DefaultConsistencyLevel, OperationType, ResourceType}, options::DiagnosticsOptions, }; @@ -889,6 +1005,11 @@ mod tests { let request = TransportRequest { method: azure_core::http::Method::Get, endpoint: endpoint.clone(), + transport_mode: TransportMode::Gateway, + operation_type: OperationType::Read, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, url: endpoint.url().clone(), headers: azure_core::http::headers::Headers::new(), body: None, @@ -918,6 +1039,7 @@ mod tests { pipeline_type: PipelineType::Metadata, transport_security: TransportSecurity::Secure, endpoint_key: endpoint.endpoint_key(), + account_name: None, }, &mut diagnostics, ) @@ -1027,6 +1149,11 @@ mod tests { TransportRequest { method: azure_core::http::Method::Get, endpoint: endpoint.clone(), + transport_mode: TransportMode::Gateway, + operation_type: OperationType::Read, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, url: endpoint.url().clone(), headers: azure_core::http::headers::Headers::new(), body: None, @@ -1063,6 +1190,7 @@ mod tests { pipeline_type: PipelineType::DataPlane, transport_security: TransportSecurity::Secure, endpoint_key: test_endpoint_key(), + account_name: None, }, &mut diagnostics, ) @@ -1111,6 +1239,7 @@ mod tests { pipeline_type: PipelineType::DataPlane, transport_security: TransportSecurity::Secure, endpoint_key: test_endpoint_key(), + account_name: None, }, &mut diagnostics, ) @@ -1148,6 +1277,7 @@ mod tests { pipeline_type: PipelineType::DataPlane, transport_security: TransportSecurity::Secure, endpoint_key: test_endpoint_key(), + account_name: None, }, &mut diagnostics, ) @@ -1183,6 +1313,7 @@ mod tests { pipeline_type: PipelineType::DataPlane, transport_security: TransportSecurity::Secure, endpoint_key: test_endpoint_key(), + account_name: None, }, &mut diagnostics, ) @@ -1207,6 +1338,382 @@ mod tests { assert_eq!(requests[0].request_sent(), RequestSentStatus::NotSent); } + #[derive(Debug)] + struct Gateway20MockTransportClient { + responses: Mutex>, + requests: Mutex>, + } + + impl Gateway20MockTransportClient { + fn new(responses: Vec) -> Self { + Self { + responses: Mutex::new(responses.into()), + requests: Mutex::new(Vec::new()), + } + } + + fn requests(&self) -> Vec { + self.requests.lock().unwrap().clone() + } + } + + #[async_trait] + impl TransportClient for Gateway20MockTransportClient { + async fn send(&self, request: &HttpRequest) -> Result { + self.requests.lock().unwrap().push(request.clone()); + self.responses.lock().unwrap().pop_front().ok_or_else(|| { + TransportError::new( + azure_core::Error::with_message(ErrorKind::Other, "no response queued"), + RequestSentStatus::Unknown, + ) + }) + } + } + + const GATEWAY20_ACTIVITY_ID: &str = "00112233-4455-6677-8899-aabbccddeeff"; + + fn gateway20_transport_request(transport_mode: TransportMode) -> TransportRequest { + let endpoint = CosmosEndpoint::global( + url::Url::parse("https://test-thin.documents.azure.com:444/").unwrap(), + ); + let mut headers = azure_core::http::headers::Headers::new(); + headers.insert("x-ms-activity-id", GATEWAY20_ACTIVITY_ID); + TransportRequest { + method: azure_core::http::Method::Get, + endpoint: endpoint.clone(), + transport_mode, + operation_type: OperationType::Read, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, + url: endpoint.url().clone(), + headers, + body: None, + auth_context: super::super::AuthorizationContext::new( + azure_core::http::Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ), + execution_context: ExecutionContext::Initial, + deadline: None, + } + } + + fn gateway20_context<'a>( + client: &'a AdaptiveTransport, + endpoint_key: EndpointKey, + account_name: Option, + credential: &'a Credential, + user_agent: &'a azure_core::http::headers::HeaderValue, + ) -> TransportPipelineContext<'a> { + TransportPipelineContext { + transport: client, + allow_sent_transport_retry: false, + credential, + user_agent, + pipeline_type: PipelineType::DataPlane, + transport_security: TransportSecurity::Secure, + endpoint_key, + account_name, + } + } + + fn gateway20_diagnostics() -> DiagnosticsContextBuilder { + DiagnosticsContextBuilder::new( + ActivityId::from_string(GATEWAY20_ACTIVITY_ID.to_owned()), + Arc::new(DiagnosticsOptions::default()), + ) + } + + #[tokio::test] + async fn gateway20_pipeline_wraps_request_and_unwraps_success_response() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![gateway20_response( + 200, + |_| {}, + b"{}", + )])); + let client = AdaptiveTransport::Gateway(mock.clone()); + let request = gateway20_transport_request(TransportMode::Gateway20); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context( + &client, + endpoint_key, + Some("account".to_owned()), + &credential, + &user_agent, + ), + &mut diagnostics, + ) + .await; + + match result.outcome { + TransportOutcome::Success { status, body, .. } => { + assert_eq!(status.status_code(), azure_core::http::StatusCode::Ok); + assert_eq!(body, b"{}".to_vec()); + } + other => panic!("expected success, got {other:?}"), + } + let captured = mock.requests(); + assert_eq!(captured.len(), 1); + assert_eq!(captured[0].method, azure_core::http::Method::Post); + assert_eq!( + captured[0] + .headers + .get_optional_str(&azure_core::http::headers::AUTHORIZATION), + None + ); + assert_eq!( + captured[0] + .headers + .get_optional_str(&azure_core::http::headers::USER_AGENT), + Some("test-agent") + ); + assert!(captured[0] + .body + .as_ref() + .is_some_and(|body| !body.is_empty())); + } + + #[tokio::test] + async fn gateway20_pipeline_leaves_standard_gateway_request_unwrapped() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![HttpResponse { + status: 200, + headers: azure_core::http::headers::Headers::new(), + body: b"plain".to_vec(), + }])); + let client = AdaptiveTransport::Gateway(mock.clone()); + let request = gateway20_transport_request(TransportMode::Gateway); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context(&client, endpoint_key, None, &credential, &user_agent), + &mut diagnostics, + ) + .await; + + match result.outcome { + TransportOutcome::Success { body, .. } => assert_eq!(body, b"plain".to_vec()), + other => panic!("expected success, got {other:?}"), + } + let captured = mock.requests(); + assert_eq!(captured.len(), 1); + assert_eq!(captured[0].method, azure_core::http::Method::Get); + assert!(captured[0] + .headers + .get_optional_str(&azure_core::http::headers::AUTHORIZATION) + .is_some()); + } + + #[tokio::test] + async fn gateway20_pipeline_decode_failure_is_sent_transport_error() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![HttpResponse { + status: 200, + headers: azure_core::http::headers::Headers::new(), + body: vec![1, 2, 3], + }])); + let client = AdaptiveTransport::Gateway(mock); + let request = gateway20_transport_request(TransportMode::Gateway20); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context( + &client, + endpoint_key, + Some("account".to_owned()), + &credential, + &user_agent, + ), + &mut diagnostics, + ) + .await; + + match result.outcome { + TransportOutcome::TransportError { + status, + request_sent, + .. + } => { + assert_eq!(status, CosmosStatus::TRANSPORT_GENERATED_503); + assert_eq!(request_sent, RequestSentStatus::Sent); + } + other => panic!("expected transport error, got {other:?}"), + } + } + + #[tokio::test] + async fn gateway20_pipeline_outer_502_propagates_unchanged_without_unwrap() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![HttpResponse { + status: 502, + headers: azure_core::http::headers::Headers::new(), + body: vec![], + }])); + let client = AdaptiveTransport::Gateway(mock.clone()); + let request = gateway20_transport_request(TransportMode::Gateway20); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context( + &client, + endpoint_key, + Some("account".to_owned()), + &credential, + &user_agent, + ), + &mut diagnostics, + ) + .await; + + match result.outcome { + TransportOutcome::HttpError { + status, + body, + request_sent, + .. + } => { + assert_eq!(u16::from(status.status_code()), 502); + assert_eq!(status.sub_status(), None); + assert_eq!(body, Vec::::new()); + assert_eq!(request_sent, RequestSentStatus::Sent); + } + other => panic!("expected HTTP error, got {other:?}"), + } + assert_eq!(mock.requests().len(), 1); + } + + #[tokio::test] + async fn gateway20_pipeline_inner_401_surfaces_as_inner_status() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![gateway20_response( + 401, + |_| {}, + b"", + )])); + let client = AdaptiveTransport::Gateway(mock.clone()); + let request = gateway20_transport_request(TransportMode::Gateway20); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context( + &client, + endpoint_key, + Some("account".to_owned()), + &credential, + &user_agent, + ), + &mut diagnostics, + ) + .await; + + match result.outcome { + TransportOutcome::HttpError { + status, + body, + request_sent, + .. + } => { + assert_eq!( + status.status_code(), + azure_core::http::StatusCode::Unauthorized + ); + assert_eq!(status.sub_status(), None); + assert_eq!(body, Vec::::new()); + assert_eq!(request_sent, RequestSentStatus::Sent); + } + other => panic!("expected HTTP error, got {other:?}"), + } + assert_eq!(mock.requests().len(), 1); + } + + #[tokio::test] + async fn gateway20_pipeline_uses_inner_retry_after_for_throttle_retry() { + let mock = Arc::new(Gateway20MockTransportClient::new(vec![ + gateway20_response( + 429, + |bytes| write_gateway20_u32_token(bytes, 0x000C, 0), + b"", + ), + gateway20_response(200, |_| {}, b"{}"), + ])); + let client = AdaptiveTransport::Gateway(mock.clone()); + let request = gateway20_transport_request(TransportMode::Gateway20); + let endpoint_key = request.endpoint.endpoint_key(); + let credential = Credential::from(azure_core::credentials::Secret::new("dGVzdA==")); + let user_agent = azure_core::http::headers::HeaderValue::from_static("test-agent"); + let mut diagnostics = gateway20_diagnostics(); + + let result = execute_transport_pipeline( + request, + &gateway20_context( + &client, + endpoint_key, + Some("account".to_owned()), + &credential, + &user_agent, + ), + &mut diagnostics, + ) + .await; + + assert!(matches!(result.outcome, TransportOutcome::Success { .. })); + assert_eq!(mock.requests().len(), 2); + } + + fn gateway20_response( + status: u32, + write_tokens: impl FnOnce(&mut Vec), + body: &[u8], + ) -> HttpResponse { + let mut bytes = Vec::new(); + bytes.extend_from_slice(&0_u32.to_le_bytes()); + bytes.extend_from_slice(&status.to_le_bytes()); + write_gateway20_uuid( + &mut bytes, + uuid::Uuid::parse_str(GATEWAY20_ACTIVITY_ID).unwrap(), + ); + write_tokens(&mut bytes); + bytes.extend_from_slice(body); + let total_len = u32::try_from(bytes.len()).unwrap(); + bytes[0..4].copy_from_slice(&total_len.to_le_bytes()); + HttpResponse { + status: 200, + headers: azure_core::http::headers::Headers::new(), + body: bytes, + } + } + + fn write_gateway20_u32_token(bytes: &mut Vec, id: u16, value: u32) { + bytes.extend_from_slice(&id.to_le_bytes()); + bytes.push(0x02); + bytes.extend_from_slice(&value.to_le_bytes()); + } + + fn write_gateway20_uuid(bytes: &mut Vec, value: uuid::Uuid) { + let value = value.as_u128(); + bytes.extend_from_slice(&((value >> 64) as u64).to_le_bytes()); + bytes.extend_from_slice(&(value as u64).to_le_bytes()); + } + #[test] fn format_transport_error_details_includes_error_chain() { let inner = std::io::Error::new(std::io::ErrorKind::ConnectionReset, "socket reset"); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_status.rs b/sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_status.rs index 1b11373585d..6150a532521 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_status.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_status.rs @@ -979,6 +979,9 @@ impl SubStatusCode { /// Transport generated 503 (20003). pub const TRANSPORT_GENERATED_503: SubStatusCode = SubStatusCode(20003); + /// Client generated 400 — request wrapping failure (20400). + pub const CLIENT_GENERATED_400: SubStatusCode = SubStatusCode(20400); + /// Client generated 401 — authorization/signing failure (20401). pub const CLIENT_GENERATED_401: SubStatusCode = SubStatusCode(20401); @@ -1336,6 +1339,15 @@ impl CosmosStatus { sub_status: Some(SubStatusCode::TRANSPORT_GENERATED_503), }; + /// Client-generated 400 Bad Request (sub-status 20400). + /// + /// Generated by the SDK when Gateway 2.0 request wrapping fails before + /// the request is sent. + pub const CLIENT_GENERATED_400: CosmosStatus = CosmosStatus { + status_code: StatusCode::BadRequest, + sub_status: Some(SubStatusCode::CLIENT_GENERATED_400), + }; + /// Client-generated 401 Unauthorized (sub-status 20401). /// /// Generated by the SDK when request signing/authorization fails before @@ -1814,6 +1826,8 @@ mod tests { assert_eq!(SubStatusCode::TRANSPORT_GENERATED_410.value(), 20001); assert_eq!(SubStatusCode::TIMEOUT_GENERATED_410.value(), 20002); assert_eq!(SubStatusCode::TRANSPORT_GENERATED_503.value(), 20003); + assert_eq!(SubStatusCode::CLIENT_GENERATED_400.value(), 20400); + assert_eq!(SubStatusCode::CLIENT_GENERATED_401.value(), 20401); assert_eq!(SubStatusCode::CLIENT_CPU_OVERLOAD.value(), 20004); assert_eq!(SubStatusCode::CLIENT_THREAD_STARVATION.value(), 20005); assert_eq!(SubStatusCode::CLIENT_OPERATION_TIMEOUT.value(), 20008); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/options/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/options/mod.rs index 559ae2dab7b..a95dce42fef 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/options/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/options/mod.rs @@ -40,6 +40,7 @@ pub use policies::{ ExcludedRegions, }; pub use priority::PriorityLevel; +pub(crate) use read_consistency::resolve_effective_consistency; pub use read_consistency::ReadConsistencyStrategy; pub use region::Region; pub use throughput_control::ThroughputControlGroupOptions; diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/options/read_consistency.rs b/sdk/cosmos/azure_data_cosmos_driver/src/options/read_consistency.rs index 391f92515f2..33d8f175cb2 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/options/read_consistency.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/options/read_consistency.rs @@ -98,6 +98,19 @@ impl ReadConsistencyStrategy { } } +/// Resolves the effective consistency level for a read consistency strategy. +pub(crate) fn resolve_effective_consistency( + strategy: ReadConsistencyStrategy, + account_default: DefaultConsistencyLevel, +) -> DefaultConsistencyLevel { + match strategy { + ReadConsistencyStrategy::Default => account_default, + ReadConsistencyStrategy::Eventual => DefaultConsistencyLevel::Eventual, + ReadConsistencyStrategy::Session => DefaultConsistencyLevel::Session, + ReadConsistencyStrategy::GlobalStrong => DefaultConsistencyLevel::Strong, + } +} + impl std::fmt::Display for ReadConsistencyStrategy { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { f.write_str(self.as_str()) @@ -204,4 +217,37 @@ mod tests { assert!(!ReadConsistencyStrategy::GlobalStrong .is_session_effective(DefaultConsistencyLevel::Session)); } + + #[test] + fn resolve_effective_consistency_table() { + let account_defaults = [ + DefaultConsistencyLevel::Strong, + DefaultConsistencyLevel::BoundedStaleness, + DefaultConsistencyLevel::Session, + DefaultConsistencyLevel::ConsistentPrefix, + DefaultConsistencyLevel::Eventual, + ]; + + for account_default in account_defaults { + assert_eq!( + resolve_effective_consistency(ReadConsistencyStrategy::Default, account_default), + account_default + ); + assert_eq!( + resolve_effective_consistency(ReadConsistencyStrategy::Eventual, account_default), + DefaultConsistencyLevel::Eventual + ); + assert_eq!( + resolve_effective_consistency(ReadConsistencyStrategy::Session, account_default), + DefaultConsistencyLevel::Session + ); + assert_eq!( + resolve_effective_consistency( + ReadConsistencyStrategy::GlobalStrong, + account_default + ), + DefaultConsistencyLevel::Strong + ); + } + } } From 56890a799a8c6bd4acc21cc14706725d8eb1eb9c Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 00:27:35 -0700 Subject: [PATCH 29/48] Use AcqRel ordering for Gateway 2.0 transport request id MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bumps `next_transport_request_id` from `Ordering::Relaxed` to `Ordering::AcqRel`. While `Relaxed` already guarantees uniqueness on `fetch_add` (the operation is atomic regardless of ordering), the stronger ordering ensures the increment is globally visible in diagnostic traces — preventing any apparent ID-collision confusion when two concurrent requests are inspected in logs. Found in deep-review pass over the Gateway 2.0 stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/driver/transport/gateway20_dispatch.rs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs index 9f9d963ddef..797c03efb65 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs @@ -187,7 +187,10 @@ fn required_header( } fn next_transport_request_id() -> u32 { - TRANSPORT_REQUEST_ID.fetch_add(1, Ordering::Relaxed) + // AcqRel ensures the increment is globally visible. Relaxed would also produce + // unique values across threads (fetch_add is atomic regardless of ordering), + // but AcqRel is preferred here for diagnostic clarity in concurrent traces. + TRANSPORT_REQUEST_ID.fetch_add(1, Ordering::AcqRel) } fn effective_partition_key_bytes(inputs: &WrapInputs<'_>) -> azure_core::Result>> { From 27218fbe070a76894177df24f6140d36bf8aca47 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 01:02:19 -0700 Subject: [PATCH 30/48] Add Gateway 2.0 Phase 6 testing & infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the Phase 6 deliverables called out in `docs/GATEWAY_20_SPEC.md` (lines ~500-594): **CI infrastructure (NEW)** - `sdk/cosmos/ci-gateway20.yml` — dedicated PR-and-manual-dispatch pipeline that runs Gateway 2.0 live tests against a pre-provisioned thin-client account. Reads `AZURE_COSMOS_GW20_ENDPOINT` / `AZURE_COSMOS_GW20_KEY` from pipeline secrets (per spec Q2). - `sdk/cosmos/live-gateway20-matrix.json` — matrix with `gateway20` and `gateway20_multi_region` test categories. **Driver pipeline tests (NEW)** - `azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs` — 7 integration tests gated on the `__internal_mocking` feature. Uses a capturing `HttpClientFactory` to inspect outgoing requests. Coverage: operator-override pool flag, V1 dual-consistency-header invariant, V2 dual-token contract lock, capabilities header pin (`x-ms-cosmos-sdk-supportedcapabilities = "9"`). Live-account companions for stored-proc fallback, diagnostics validation, and operator override are stubbed with TODO(Phase 6) markers — they require either a public SDK Gateway 2.0 toggle or per-request diagnostics surfaces that don't exist yet. **Driver fault injection (EDIT)** - `driver_fault_injection.rs` — adds 3 emulator-gated contract locks: 503 → regional failover, 408 → cross-region for reads, 404/1002 → remote-preferred without PKRange refresh. Today the rules fire on whichever transport is selected at dispatch (the `FaultInjectionCondition` API does not yet support a per-transport- kind filter); each test carries a TODO(Phase 6) describing the tightening once that filter lands. **SDK tests (NEW + EDIT)** - `azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs` — 6 scaffolded E2E stubs gated on `test_category = "gateway20"` plus the `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env vars. Bodies are intentionally empty until `CosmosClientOptions` exposes a public Gateway 2.0 toggle; the test names lock in the contract. - `cosmos_fault_injection.rs` — adds a Gateway 2.0 ConnectionError fallback contract lock with a TODO(Phase 6) marker. - `mod.rs` — registers the new `gateway20_e2e` module. - `azure_data_cosmos/build.rs` — adds `gateway20` to the recognized `test_category` cfg values so the new e2e file compiles without `--cfg` warnings. **Verification:** `cargo fmt`, `cargo clippy --all-features --all-targets -- -D warnings`, `cargo build --all-features --tests`, and `cargo test --all-features` all pass on both crates. Driver maintains the 744-test baseline; SDK maintains the 463-test baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos/build.rs | 4 +- .../emulator_tests/cosmos_fault_injection.rs | 88 ++++ .../tests/emulator_tests/gateway20_e2e.rs | 159 +++++++ .../tests/emulator_tests/mod.rs | 1 + .../emulator_tests/driver_fault_injection.rs | 189 ++++++++ .../tests/gateway20_pipeline_tests.rs | 444 ++++++++++++++++++ sdk/cosmos/ci-gateway20.yml | 52 ++ sdk/cosmos/live-gateway20-matrix.json | 21 + 8 files changed, 957 insertions(+), 1 deletion(-) create mode 100644 sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs create mode 100644 sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs create mode 100644 sdk/cosmos/ci-gateway20.yml create mode 100644 sdk/cosmos/live-gateway20-matrix.json diff --git a/sdk/cosmos/azure_data_cosmos/build.rs b/sdk/cosmos/azure_data_cosmos/build.rs index 9862a46d7ee..b22c7b395db 100644 --- a/sdk/cosmos/azure_data_cosmos/build.rs +++ b/sdk/cosmos/azure_data_cosmos/build.rs @@ -6,5 +6,7 @@ // unknown cfg names are warned/denied unless explicitly declared via check-cfg. fn main() { // Allow `#[cfg_attr(not(test_category = "..."), ignore)]` in `tests/*.rs`. - println!("cargo:rustc-check-cfg=cfg(test_category, values(\"emulator\", \"multi_write\", \"split\"))"); + println!( + "cargo:rustc-check-cfg=cfg(test_category, values(\"emulator\", \"multi_write\", \"split\", \"gateway20\"))" + ); } diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs index ebb2908a68c..fcfd2f7f848 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs @@ -996,3 +996,91 @@ pub async fn fault_injection_enable_disable_rule() -> Result<(), Box> ) .await } + +// ---------------------------------------------------------------------------- +// Gateway 2.0 fault injection coverage (Phase 6) +// ---------------------------------------------------------------------------- + +/// Gateway 2.0 ConnectionError should fall back to the standard gateway +/// transparently — the client must not surface the connection failure to the +/// caller when a usable fallback transport exists. +/// +/// **Limitations**: +/// * `FaultInjectionCondition` does not yet expose a per-transport-kind +/// filter, so the rule fires on whichever transport is selected at dispatch +/// time. Today the SDK does not expose a public Gateway 2.0 enable API +/// (see `CosmosClientOptions`), so the SDK currently never selects the +/// Gateway 2.0 transport — this test runs in standard-gateway mode only. +/// * Once `CosmosClientOptions` exposes a Gateway 2.0 toggle and +/// `FaultInjectionCondition` supports `with_transport_kind`, scope this +/// rule to `TransportKind::Gateway20` and assert the request **succeeds** +/// with the standard-gateway fallback recorded in diagnostics. +#[tokio::test] +#[cfg_attr( + not(test_category = "emulator"), + ignore = "requires test_category 'emulator'" +)] +pub async fn gateway20_connection_error_falls_back_to_standard_gateway( +) -> Result<(), Box> { + let server_error = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ConnectionError) + .with_probability(1.0) + .build(); + + let condition = FaultInjectionConditionBuilder::new() + .with_operation_type(FaultOperationType::ReadItem) + .build(); + + let rule = FaultInjectionRuleBuilder::new("gateway20-conn-error-fallback", server_error) + .with_condition(condition) + .build(); + + let fault_builder = FaultInjectionClientBuilder::new().with_rule(Arc::new(rule)); + + TestClient::run_with_unique_db( + async |run_context, db_client| { + let container_id = format!("Container-{}", Uuid::new_v4()); + let container_client = run_context + .create_container_with_throughput( + db_client, + ContainerProperties::new(container_id.clone(), "/partition_key".into()), + ThroughputProperties::manual(400), + ) + .await?; + + let unique_id = Uuid::new_v4().to_string(); + let item = create_test_item(&unique_id); + let pk = format!("Partition-{}", unique_id); + let item_id = format!("Item-{}", unique_id); + + container_client.create_item(&pk, &item, None).await?; + + let fault_client = run_context + .fault_client() + .expect("fault client should be available"); + let fault_db_client = fault_client.database_client(db_client.id()); + let fault_container_client = fault_db_client.container_client(&container_id).await?; + + // Today the rule fires on the standard gateway path (the SDK does + // not yet route through Gateway 2.0). The read should fail because + // there is no further fallback below the standard gateway. + // + // TODO(Phase 6): once the SDK exposes a public Gateway 2.0 + // enable API and `FaultInjectionCondition` supports a + // per-transport-kind filter, scope the rule to + // `TransportKind::Gateway20` and assert this read SUCCEEDS via + // the standard-gateway fallback. + let result = fault_container_client + .read_item::(&pk, &item_id, None) + .await; + assert!( + result.is_err(), + "Today the read should fail; once Gateway 2.0 fallback lands, this should succeed" + ); + + Ok(()) + }, + Some(TestOptions::new().with_fault_injection_builder(fault_builder)), + ) + .await +} diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs new file mode 100644 index 00000000000..a8fdcf2bfbf --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -0,0 +1,159 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! End-to-end tests for the Gateway 2.0 transport, exercised through the +//! `azure_data_cosmos` SDK surface (not the underlying driver crate). +//! +//! These tests run against a pre-provisioned Gateway 2.0 ("thin client") +//! account. The endpoint and primary key are read from the +//! `AZURE_COSMOS_GW20_ENDPOINT` and `AZURE_COSMOS_GW20_KEY` environment +//! variables and gated by the `gateway20` test category. They are skipped by +//! default; the dedicated `ci-gateway20.yml` pipeline sets the matrix entry's +//! `testCategory` to `gateway20` (or `gateway20_multi_region`) so the tests +//! run in CI against the live account. +//! +//! ## Current state +//! +//! Every test in this file is a placeholder stub: +//! +//! * [`CosmosClientOptions`](azure_data_cosmos::CosmosClientOptions) does not +//! yet expose a public Gateway 2.0 enable/disable setter. Without it, the +//! SDK cannot deterministically opt a client into Gateway 2.0 from outside +//! the crate. +//! * The driver-level toggle +//! (`ConnectionPoolOptions::with_is_gateway20_allowed`) is not wired through +//! to `CosmosClientOptions`. +//! +//! Once the SDK exposes a public `with_gateway20_disabled` (or equivalent) +//! setter, fill in each test body. Until then, the bodies are intentionally +//! empty so the file compiles and the test names lock in the contract. + +#![cfg(feature = "key_auth")] + +fn read_env(name: &str) -> Option { + std::env::var(name).ok().filter(|v| !v.trim().is_empty()) +} + +/// Returns `Some((endpoint, key))` only when both env vars are set. +fn live_credentials() -> Option<(String, String)> { + Some(( + read_env("AZURE_COSMOS_GW20_ENDPOINT")?, + read_env("AZURE_COSMOS_GW20_KEY")?, + )) +} + +/// Drives a point CRUD round-trip (create → read → replace → delete) against +/// the live Gateway 2.0 account. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 +/// toggle. Build a client with that toggle enabled, drive the CRUD operations, +/// and assert each request reports `TransportKind::Gateway20` in diagnostics. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_point_crud_round_trip() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): build a CosmosClient with Gateway 2.0 enabled and run a + // create/read/replace/delete cycle on a single item, asserting each + // response surfaces `TransportKind::Gateway20` in diagnostics. +} + +/// Runs a SQL query through Gateway 2.0 and asserts the streamed pages all +/// route through the thin-client transport. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 +/// toggle. Open a query feed, iterate every page, and assert the diagnostics +/// for each page record `TransportKind::Gateway20`. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_query_streams_through_thin_client() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): drive a multi-page SELECT query and assert every page's + // diagnostics report `TransportKind::Gateway20`. +} + +/// Runs a transactional batch through Gateway 2.0. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 +/// toggle. Submit a batch with mixed create/upsert/delete operations and +/// assert it routes through Gateway 2.0 end-to-end. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_transactional_batch() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): submit a mixed-op transactional batch and assert the + // diagnostics report `TransportKind::Gateway20`. +} + +/// Drives a `LatestVersion` change feed iterator through Gateway 2.0. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 +/// toggle. Open a change feed iterator with `mode = LatestVersion` and assert +/// pages are served via Gateway 2.0. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_change_feed_latest_version() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): consume a LatestVersion change feed and assert the + // diagnostics report `TransportKind::Gateway20`. +} + +/// Verifies that `RequestDiagnostics` correctly reports +/// `TransportKind::Gateway20` for SDK-issued requests. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 +/// toggle and the SDK surfaces `TransportKind` on its public diagnostics +/// type. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_diagnostics_validation() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): perform a single read against the live account and + // assert the resulting diagnostics record `TransportKind::Gateway20`. +} + +/// Verifies the operator override at the SDK boundary: when the operator +/// disables Gateway 2.0 via the public client option, every request must +/// route through the standard gateway even though the account advertises a +/// thin-client endpoint. +/// +/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a public +/// Gateway 2.0 toggle. Build a client with the toggle disabled, drive a +/// point read, and assert diagnostics report `TransportKind::StandardGateway`. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_operator_override_at_sdk_boundary() { + let Some((_endpoint, _key)) = live_credentials() else { + return; + }; + // TODO(Phase 6): build a CosmosClient with Gateway 2.0 explicitly + // disabled, drive a point read, and assert the diagnostics report + // `TransportKind::StandardGateway`. +} diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/mod.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/mod.rs index 02b9779b1b3..bc8689e0c44 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/mod.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/mod.rs @@ -10,6 +10,7 @@ mod cosmos_items; mod cosmos_offers; mod cosmos_proxy; mod cosmos_query; +mod gateway20_e2e; #[path = "../framework/mod.rs"] mod framework; diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs index de3810bc294..2b89c026ce6 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs @@ -327,3 +327,192 @@ pub async fn fault_injection_connection_error() -> Result<(), Box> { }) .await } + +// ---------------------------------------------------------------------------- +// Gateway 2.0 fault injection coverage (Phase 6) +// ---------------------------------------------------------------------------- +// +// The following three tests lock in the retry/failover behavior the Gateway +// 2.0 transport must exhibit when the underlying thin-client connection fails. +// Each test exercises a distinct failure shape: +// +// - 503 Service Unavailable → regional failover +// - 408 Request Timeout → cross-region for reads / local-only for writes +// - 404/1002 Read Session → remote-preferred + no PKRange refresh +// +// **Limitation**: `FaultInjectionCondition` does not yet expose a per-transport- +// kind filter — there is no `with_transport_kind(TransportKind::Gateway20)` +// today. As a result, faults injected here apply to whichever transport happens +// to be selected at dispatch time. To reliably exercise these against Gateway +// 2.0, the Phase 6 CI matrix must run them on a live thin-client account +// (`testCategory = 'gateway20'`); the emulator does not yet expose Gateway +// 2.0 endpoints. See `docs/GATEWAY_20_SPEC.md` (Phase 6) for the harness gap. + +/// Gateway 2.0 503 Service Unavailable should trigger regional failover. +/// +/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind +/// filter, scope this rule to `TransportKind::Gateway20` so it doesn't also +/// fire on standard-gateway requests issued during account discovery. +#[tokio::test] +#[cfg_attr( + not(test_category = "emulator"), + ignore = "requires test_category 'emulator'" +)] +pub async fn gateway20_service_unavailable_triggers_regional_failover() -> Result<(), Box> +{ + let condition = FaultInjectionConditionBuilder::new() + .with_operation_type(FaultOperationType::ReadItem) + .build(); + + let result = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ServiceUnavailable) + .with_probability(1.0) + .build(); + + let rule = Arc::new( + FaultInjectionRuleBuilder::new("gateway20-503-failover", result) + .with_condition(condition) + .build(), + ); + let rules = vec![Arc::clone(&rule)]; + + DriverTestClient::run_with_unique_db_and_fault_injection(rules, async |context, database| { + let container_name = context.unique_container_name(); + let container = context + .create_container(&database, &container_name, "/pk") + .await?; + + let item_json = br#"{"id": "item1", "pk": "pk1", "value": "test"}"#; + context.create_item(&container, "pk1", item_json).await?; + + // The read should fail (single region, fault always fires) but the + // failover machinery must have been invoked. Once `RequestDiagnostics` + // exposes per-attempt endpoint selection, assert that the diagnostics + // record at least one regional failover attempt. + let read_result = context.read_item(&container, "item1", "pk1").await; + assert!( + read_result.is_err(), + "Read should fail when 503 fires on every attempt" + ); + + assert!(rule.hit_count() > 0, "Rule should have been hit"); + + Ok(()) + }) + .await +} + +/// Gateway 2.0 408 Request Timeout should retry across regions for reads, +/// but stay local-only for writes (single-region writes can't safely retry +/// across regions without risking duplicates). +/// +/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind +/// filter, scope this rule to `TransportKind::Gateway20`. Today the emulator +/// only exposes the standard gateway, so this test executes against the +/// standard transport in the emulator — it acts as a contract lock for the +/// behavior that must also hold on Gateway 2.0 once a thin-client account is +/// available in CI. +#[tokio::test] +#[cfg_attr( + not(test_category = "emulator"), + ignore = "requires test_category 'emulator'" +)] +pub async fn gateway20_request_timeout_cross_region_for_reads() -> Result<(), Box> { + let condition = FaultInjectionConditionBuilder::new() + .with_operation_type(FaultOperationType::ReadItem) + .build(); + + let result = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::Timeout) + .with_probability(1.0) + .build(); + + let rule = Arc::new( + FaultInjectionRuleBuilder::new("gateway20-408-cross-region", result) + .with_condition(condition) + .build(), + ); + let rules = vec![Arc::clone(&rule)]; + + DriverTestClient::run_with_unique_db_and_fault_injection(rules, async |context, database| { + let container_name = context.unique_container_name(); + let container = context + .create_container(&database, &container_name, "/pk") + .await?; + + let item_json = br#"{"id": "item1", "pk": "pk1", "value": "test"}"#; + context.create_item(&container, "pk1", item_json).await?; + + let read_result = context.read_item(&container, "item1", "pk1").await; + assert!( + read_result.is_err(), + "Read should ultimately fail when 408 fires on every attempt" + ); + + // TODO(Phase 6): once diagnostics expose retry attempts, assert that + // a single-region account exhausts local-only retries while a + // multi-region account performs at least one cross-region attempt. + assert!(rule.hit_count() > 0, "Rule should have been hit"); + + Ok(()) + }) + .await +} + +/// Gateway 2.0 404/1002 ReadSessionNotAvailable must trigger a +/// remote-preferred retry path **without** invalidating the partition-key +/// range (PKRange) cache. The 404/1002 substatus indicates a session-token +/// mismatch, which is unrelated to the routing topology — refreshing PKRange +/// would be a wasted metadata round-trip. +/// +/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind +/// filter, scope to `TransportKind::Gateway20`. Today, asserting the absence +/// of a PKRange refresh requires diagnostics that record metadata-cache hits +/// — this test is the contract lock and will tighten its assertion once that +/// observability is in place. +#[tokio::test] +#[cfg_attr( + not(test_category = "emulator"), + ignore = "requires test_category 'emulator'" +)] +pub async fn gateway20_read_session_not_available_remote_preferred() -> Result<(), Box> { + let condition = FaultInjectionConditionBuilder::new() + .with_operation_type(FaultOperationType::ReadItem) + .build(); + + let result = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ReadSessionNotAvailable) + .with_probability(1.0) + .build(); + + let rule = Arc::new( + FaultInjectionRuleBuilder::new("gateway20-1002-remote-preferred", result) + .with_condition(condition) + .build(), + ); + let rules = vec![Arc::clone(&rule)]; + + DriverTestClient::run_with_unique_db_and_fault_injection(rules, async |context, database| { + let container_name = context.unique_container_name(); + let container = context + .create_container(&database, &container_name, "/pk") + .await?; + + let item_json = br#"{"id": "item1", "pk": "pk1", "value": "test"}"#; + context.create_item(&container, "pk1", item_json).await?; + + let read_result = context.read_item(&container, "item1", "pk1").await; + assert!( + read_result.is_err(), + "Read should fail when 404/1002 fires on every attempt" + ); + + // TODO(Phase 6): once diagnostics record metadata-cache hits, assert + // that the PKRange cache was NOT refreshed during these retries (a + // 404/1002 is a session-token issue, not a routing-topology issue). + assert!(rule.hit_count() > 0, "Rule should have been hit"); + + Ok(()) + }) + .await +} diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs new file mode 100644 index 00000000000..19a550f5869 --- /dev/null +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs @@ -0,0 +1,444 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//! Integration tests that lock in the Gateway 2.0 transport pipeline contract. +//! +//! These tests cover Phase 6 of the Gateway 2.0 specification (see +//! `docs/GATEWAY_20_SPEC.md`). They run as a standalone integration target so +//! they exercise the public surface of the driver crate end-to-end (no +//! `pub(crate)` access). +//! +//! ## Categories +//! +//! 1. **Operator override** — the operator can opt out of Gateway 2.0 even when +//! the account advertises a thin-client endpoint. Verified via the public +//! [`ConnectionPoolOptions::with_is_gateway20_allowed`] toggle. +//! +//! 2. **Operation eligibility** — operations that Gateway 2.0 does not yet +//! support (e.g., stored procedure execution) must transparently fall back +//! to the standard gateway. Documented as an env-gated stub today; the +//! inside-crate routing tests in `operation_pipeline.rs` cover the +//! decision logic. +//! +//! 3. **Diagnostics fidelity** — `RequestDiagnostics` records the actual +//! `TransportKind` used. Documented as an env-gated stub today. +//! +//! 4. **Dual-consistency invariants (V1)** — the V1 HTTP path must never emit +//! *both* the legacy `x-ms-consistency-level` and the newer +//! `x-ms-cosmos-read-consistency-strategy` headers. Asserted via captured +//! HTTP requests through the `__internal_mocking` factory. +//! +//! 5. **Dual-consistency invariants (V2)** — the V2 RNTBD path must never +//! serialize *both* `ConsistencyLevel` and a separate +//! `ReadConsistencyStrategy` token. Documented as an invariant lock; the +//! underlying RNTBD enum currently exposes only the `ConsistencyLevel` +//! token (`tokens.rs`), so the invariant is structurally guaranteed. +//! +//! 6. **Capabilities header pin** — every outgoing request carries +//! `x-ms-cosmos-sdk-supportedcapabilities = "9"`. Asserted via the first +//! captured request through the mock factory. +//! +//! ## Why `__internal_mocking`? +//! +//! Several of these contracts can only be observed at the network boundary. +//! The driver exposes a [`HttpClientFactory`] override under the +//! `__internal_mocking` feature flag specifically for tests like these — it +//! lets us substitute a capturing transport so we can inspect the very first +//! request the runtime emits (the account-properties probe), without ever +//! touching the network. + +#![cfg(feature = "__internal_mocking")] + +use std::sync::{Arc, Mutex}; + +use async_trait::async_trait; +use azure_data_cosmos_driver::models::{AccountReference, CosmosOperation, DatabaseReference}; +use azure_data_cosmos_driver::options::{DriverOptions, OperationOptions}; +use azure_data_cosmos_driver::testing::{ + ConnectionPoolOptions, HttpClientConfig, HttpClientFactory, HttpRequest, HttpResponse, + TransportClient, TransportError, +}; +use azure_data_cosmos_driver::CosmosDriverRuntime; +use url::Url; + +// ---------------------------------------------------------------------------- +// Capturing transport +// ---------------------------------------------------------------------------- + +/// Records every outgoing request. By default, every send returns a +/// connection-style failure so the runtime aborts before the second hop, which +/// keeps the test focused on the first wire frame. +#[derive(Debug, Default)] +struct CapturingTransport { + requests: Mutex>, +} + +impl CapturingTransport { + fn requests(&self) -> Vec { + self.requests + .lock() + .expect("poisoned capture mutex") + .clone() + } +} + +#[async_trait] +impl TransportClient for CapturingTransport { + async fn send(&self, request: &HttpRequest) -> Result { + self.requests + .lock() + .expect("poisoned capture mutex") + .push(request.clone()); + + Err(TransportError::new( + azure_core::Error::with_message( + azure_core::error::ErrorKind::Other, + "capturing transport refuses every request", + ), + azure_data_cosmos_driver::diagnostics::RequestSentStatus::NotSent, + )) + } +} + +#[derive(Debug)] +struct CapturingFactory { + transport: Arc, +} + +impl CapturingFactory { + fn new() -> (Self, Arc) { + let transport = Arc::new(CapturingTransport::default()); + ( + Self { + transport: transport.clone(), + }, + transport, + ) + } +} + +impl HttpClientFactory for CapturingFactory { + fn build( + &self, + _connection_pool: &ConnectionPoolOptions, + _config: HttpClientConfig, + ) -> azure_core::Result> { + Ok(self.transport.clone() as Arc) + } +} + +// ---------------------------------------------------------------------------- +// Helpers +// ---------------------------------------------------------------------------- + +fn fake_account() -> AccountReference { + let url = + Url::parse("https://gw20-pipeline-tests.documents.azure.com/").expect("static URL parses"); + // Master-key value is base64-encoded; the bytes never reach the wire because + // the capturing transport short-circuits every send. + AccountReference::with_master_key(url, "dGVzdC1tYXN0ZXIta2V5") +} + +fn read_env(name: &str) -> Option { + std::env::var(name).ok().filter(|v| !v.trim().is_empty()) +} + +fn live_account_from_env() -> Option { + let endpoint = read_env("AZURE_COSMOS_GW20_ENDPOINT")?; + let key = read_env("AZURE_COSMOS_GW20_KEY")?; + let url = Url::parse(&endpoint).ok()?; + Some(AccountReference::with_master_key(url, key)) +} + +/// Builds a runtime with the capturing factory and the requested +/// gateway-20 toggle. The two flags reflect the operator override exposed via +/// `ConnectionPoolOptions`. +async fn capturing_runtime( + is_gateway20_allowed: bool, +) -> (Arc, Arc) { + let (factory, transport) = CapturingFactory::new(); + let pool = ConnectionPoolOptions::builder() + .with_is_gateway20_allowed(is_gateway20_allowed) + .build() + .expect("connection pool builds"); + let runtime = CosmosDriverRuntime::builder() + .with_connection_pool(pool) + .with_mock_http_client_factory(Arc::new(factory)) + .build() + .await + .expect("runtime builds with mock factory"); + (runtime, transport) +} + +/// Drive a no-op probe so the runtime emits at least one HTTP request. +/// +/// The capturing transport refuses every send, so this always returns an +/// error. We only care about the captured frames. +async fn probe(runtime: &Arc) { + let account = fake_account(); + let options = DriverOptions::builder(account.clone()).build(); + let _ = runtime.get_or_create_driver(account, Some(options)).await; +} + +// ---------------------------------------------------------------------------- +// (a) Operator override forces standard gateway routing +// ---------------------------------------------------------------------------- + +/// Verifies that the operator override flag (`with_is_gateway20_allowed(false)`) +/// is honored end-to-end at the connection-pool level. When the flag is off, +/// the runtime must not select the Gateway 2.0 transport even if account +/// metadata advertises a thin-client endpoint. +/// +/// We assert the contract structurally via `ConnectionPoolOptions`: when the +/// flag is `false`, `is_gateway20_allowed()` reports `false`, and the +/// transport-layer dispatcher branches to the standard gateway (this branching +/// is covered by the inside-crate tests in +/// `driver::transport::tests::dataplane_transport_*`). +#[tokio::test] +async fn operator_override_disables_gateway20_at_pool_level() { + let off = ConnectionPoolOptions::builder() + .with_is_gateway20_allowed(false) + .build() + .expect("pool builds"); + assert!( + !off.is_gateway20_allowed(), + "operator-disabled pool must report is_gateway20_allowed = false" + ); + + let on = ConnectionPoolOptions::builder() + .with_is_gateway20_allowed(true) + .build() + .expect("pool builds"); + assert!( + on.is_gateway20_allowed(), + "operator-enabled pool must report is_gateway20_allowed = true" + ); +} + +/// Live-account companion to the above. Drives a real read against a +/// pre-provisioned Gateway 2.0 account with the operator override turned off, +/// then asserts (TODO once diagnostics expose `TransportKind`) that the +/// request used the standard gateway transport. +#[tokio::test] +#[ignore = "Requires AZURE_COSMOS_GW20_ENDPOINT/_KEY to a Gateway 2.0 account"] +async fn operator_override_routes_reads_to_standard_gateway() { + let Some(account) = live_account_from_env() else { + return; + }; + + // TODO(Phase 6): once diagnostics expose `TransportKind` per request, + // assert that every request used `TransportKind::StandardGateway`. + let pool = ConnectionPoolOptions::builder() + .with_is_gateway20_allowed(false) + .build() + .expect("pool builds"); + let runtime = CosmosDriverRuntime::builder() + .with_connection_pool(pool) + .build() + .await + .expect("runtime builds"); + let driver = runtime + .get_or_create_driver(account.clone(), None) + .await + .expect("driver init succeeds against the live account"); + + let db = read_env("AZURE_COSMOS_GW20_DATABASE").unwrap_or_else(|| "gw20-tests".to_string()); + let db_ref = DatabaseReference::from_name(driver.account().clone(), db); + + let _ = driver + .execute_operation( + CosmosOperation::read_database(db_ref), + OperationOptions::default(), + ) + .await; +} + +// ---------------------------------------------------------------------------- +// (b) Operation eligibility fallback (StoredProc Execute → standard gateway) +// ---------------------------------------------------------------------------- + +/// Stored procedure execution is not yet supported by Gateway 2.0 and must +/// fall back to the standard gateway transparently. +/// +/// The eligibility decision is made in `resolve_endpoint` +/// (operation_pipeline.rs); the inside-crate tests in +/// `driver::pipeline::operation_pipeline::tests::resolve_endpoint_*` cover the +/// matrix exhaustively. This standalone test is the live-account contract +/// lock — once `TransportKind` is exposed in diagnostics, assert that the +/// stored-procedure-execute request used `TransportKind::StandardGateway` +/// while a co-located point read on the same account used +/// `TransportKind::Gateway20`. +#[tokio::test] +#[ignore = "Requires AZURE_COSMOS_GW20_ENDPOINT/_KEY plus a stored procedure resource"] +async fn stored_proc_execute_falls_back_to_standard_gateway() { + let Some(_account) = live_account_from_env() else { + return; + }; + // TODO(Phase 6): drive `CosmosOperation::execute_stored_procedure(...)` + // against a real account and assert the diagnostics record + // `TransportKind::StandardGateway` for that request specifically while + // co-located point reads/writes record `TransportKind::Gateway20`. +} + +// ---------------------------------------------------------------------------- +// (c) Diagnostics records TransportKind::Gateway20 +// ---------------------------------------------------------------------------- + +/// Once Gateway 2.0 has dispatched a request, the recorded +/// `RequestDiagnostics` for that request must indicate `TransportKind::Gateway20`. +/// +/// This contract requires a live thin-client account. The inside-crate test +/// `transport_pipeline::tests::gateway20_pipeline_records_transport_kind` +/// already covers the wiring at the unit-test level; this standalone test is +/// the live-account companion. +#[tokio::test] +#[ignore = "Requires AZURE_COSMOS_GW20_ENDPOINT/_KEY to a Gateway 2.0 account"] +async fn diagnostics_records_gateway20_transport_kind() { + let Some(_account) = live_account_from_env() else { + return; + }; + // TODO(Phase 6): once `TransportKind` is exposed on the public + // `RequestDiagnostics`, drive a point read against the live Gateway 2.0 + // account and assert the diagnostics report `TransportKind::Gateway20`. +} + +// ---------------------------------------------------------------------------- +// (d) V1 HTTP dual-consistency-header invariant +// ---------------------------------------------------------------------------- + +/// The V1 HTTP path must never emit *both* the legacy +/// `x-ms-consistency-level` header and the newer +/// `x-ms-cosmos-read-consistency-strategy` header on the same request. +/// +/// Today the V1 path emits *neither* header (consistency is propagated via +/// the operation context, not a wire header), so the invariant trivially +/// holds. We capture the first wire frame the runtime emits and assert the +/// pair-presence rule. +#[tokio::test] +async fn v1_http_never_emits_both_consistency_headers() { + const LEGACY: &str = "x-ms-consistency-level"; + const STRATEGY: &str = "x-ms-cosmos-read-consistency-strategy"; + + let (runtime, transport) = capturing_runtime(false).await; + probe(&runtime).await; + + let captured = transport.requests(); + for req in &captured { + let has_legacy = req.headers.iter().any(|(name, _)| name.as_str() == LEGACY); + let has_strategy = req + .headers + .iter() + .any(|(name, _)| name.as_str() == STRATEGY); + assert!( + !(has_legacy && has_strategy), + "request {:?} emitted both '{LEGACY}' and '{STRATEGY}' — V1 invariant violated", + req.url + ); + } +} + +// ---------------------------------------------------------------------------- +// (e) V2 RNTBD dual-consistency-token invariant +// ---------------------------------------------------------------------------- + +/// The V2 (RNTBD) path must never serialize *both* a `ConsistencyLevel` token +/// and a separate `ReadConsistencyStrategy` token on the same wrapped frame. +/// +/// Today the RNTBD token enum +/// (`driver::transport::rntbd::tokens::RntbdRequestToken`) exposes only the +/// `ConsistencyLevel` variant — there is no `ReadConsistencyStrategy` token +/// at all — so the invariant is structurally guaranteed by the type system. +/// This test is therefore a *contract lock* expressed at the boundary this +/// integration test can actually observe. +/// +/// `CapturingTransport` lives at the `HttpClientFactory` layer, so it only +/// ever sees V1 HTTP requests (account-properties probe, metadata reads, +/// etc.). RNTBD frames are dispatched via a separate TCP transport and are +/// invisible here. We assert two things: +/// +/// 1. The capturing transport actually recorded at least one request — i.e. +/// the test setup is wired correctly and the runtime did make outbound +/// progress. +/// 2. Every captured request uses an `http`/`https` scheme. If a future +/// change ever tunnels wrapped RNTBD frames through HTTP (or pushes the +/// capture point lower in the stack so RNTBD is observable here), this +/// assertion fires and forces a reviewer to upgrade the test to parse +/// the wrapped frame and assert at-most-one consistency token per frame. +/// +/// The structural invariant inside the wrapped frame is exhaustively covered +/// by the inside-crate tests in `gateway20_dispatch::tests::wraps_with_*`; +/// this test exists to prevent that coverage from silently disappearing if +/// the V2 transport boundary moves. +#[tokio::test] +async fn v2_rntbd_never_emits_both_consistency_tokens() { + let (runtime, transport) = capturing_runtime(true).await; + probe(&runtime).await; + + let captured = transport.requests(); + assert!( + !captured.is_empty(), + "capturing transport recorded zero requests; the V2 invariant test \ + setup is broken (no traffic was generated at all)" + ); + + // CONTRACT LOCK: today every captured request is a V1 HTTP probe by + // construction. If this assertion ever fails, RNTBD-bearing traffic has + // become observable at the HttpClientFactory layer and the body must be + // structurally decoded to assert mutual exclusion of `ConsistencyLevel` + // and any future `ReadConsistencyStrategy` token. + // + // TODO(Phase 6): when a `ReadConsistencyStrategy` RNTBD token lands, + // replace this scheme check with a structural decode of the wrapped + // frame and assert at-most-one consistency token per wrapped request. + for req in &captured { + let scheme = req.url.scheme(); + assert!( + scheme == "http" || scheme == "https", + "captured request to {} uses scheme {:?}; the V2 dual-token \ + contract lock is invalidated — upgrade this test to parse the \ + wrapped RNTBD frame and assert mutual exclusion of consistency \ + tokens", + req.url, + scheme, + ); + } +} + +// ---------------------------------------------------------------------------- +// (f) Capabilities header pin +// ---------------------------------------------------------------------------- + +/// Every outgoing HTTP request must carry +/// `x-ms-cosmos-sdk-supportedcapabilities: 9`. The bitmask "9" is the +/// concatenation of `PartitionMerge` (1) and `IgnoreUnknownRntbdTokens` (8), +/// which Gateway 2.0 inspects to decide whether the SDK can tolerate unknown +/// RNTBD tokens. +/// +/// This is the load-bearing forward-compatibility advertisement for Gateway +/// 2.0 — it MUST stay pinned to "9" until both bits are coordinated with a +/// service-side rollout. +#[tokio::test] +async fn capabilities_header_value_is_pinned_to_nine() { + const CAPABILITIES: &str = "x-ms-cosmos-sdk-supportedcapabilities"; + + let (runtime, transport) = capturing_runtime(true).await; + probe(&runtime).await; + + let captured = transport.requests(); + assert!( + !captured.is_empty(), + "runtime should have emitted at least one request via the mock factory" + ); + + for req in &captured { + let value = req.headers.iter().find_map(|(name, value)| { + (name.as_str() == CAPABILITIES).then(|| value.as_str().to_owned()) + }); + assert_eq!( + value.as_deref(), + Some("9"), + "capabilities header missing or wrong on request to {}", + req.url + ); + } +} diff --git a/sdk/cosmos/ci-gateway20.yml b/sdk/cosmos/ci-gateway20.yml new file mode 100644 index 00000000000..18ee0c4b47d --- /dev/null +++ b/sdk/cosmos/ci-gateway20.yml @@ -0,0 +1,52 @@ +# NOTE: Please refer to https://aka.ms/azsdk/engsys/ci-yaml before editing this file. +# +# Gateway 2.0 live-test pipeline. +# +# This pipeline runs the Cosmos client tests against a pre-provisioned Gateway 2.0 +# (a.k.a. "thin client") account. It is a sibling of `ci.yml` and shares the same +# template (`archetype-sdk-client.yml`) but uses a separate matrix +# (`live-gateway20-matrix.json`) so that the matrix entries map onto the dedicated +# test categories `gateway20` and `gateway20_multi_region`. +# +# The Gateway 2.0 account is **not** provisioned per-pipeline-run — it must be +# created out-of-band by the engineering system. The pipeline reads its endpoint +# and key from these pipeline secrets: +# +# AZURE_COSMOS_GW20_ENDPOINT — endpoint URL +# AZURE_COSMOS_GW20_KEY — primary master key +# +# The pipeline triggers only on PRs and manual / scheduled dispatch (no `branches` +# trigger). This avoids accidental runs on every main push while still allowing +# developers to validate Gateway 2.0 changes from a PR. + +trigger: none + +pr: + branches: + include: + - main + - hotfix/* + - release/* + paths: + include: + - sdk/cosmos/ + +parameters: +- name: RunLiveTests + displayName: Run live tests + type: boolean + default: true + +extends: + template: /eng/pipelines/templates/stages/archetype-sdk-client.yml + parameters: + ServiceDirectory: cosmos + RunLiveTests: ${{ parameters.RunLiveTests }} + CloudConfig: + Public: + ServiceConnection: azure-sdk-tests-cosmos + LiveTestMatrixConfigs: + - Name: Cosmos_gateway20_live_test + Path: sdk/cosmos/live-gateway20-matrix.json + Selection: sparse + GenerateVMJobs: true diff --git a/sdk/cosmos/live-gateway20-matrix.json b/sdk/cosmos/live-gateway20-matrix.json new file mode 100644 index 00000000000..4075306ca7b --- /dev/null +++ b/sdk/cosmos/live-gateway20-matrix.json @@ -0,0 +1,21 @@ +{ + "displayNames": {}, + "matrix": { + "Agent": { + "ubuntu": { + "OSVmImage": "env:LINUXVMIMAGE", + "Pool": "env:LINUXPOOL" + } + }, + "RustToolchainName": ["stable"], + "Gateway20 Settings": { + "Session SingleRegion": { + "ArmTemplateParameters": "@{ defaultConsistencyLevel = 'Session'; testCategory = 'gateway20' }" + }, + "Session MultiRegion": { + "ArmTemplateParameters": "@{ defaultConsistencyLevel = 'Session'; enableMultipleRegions = $true; testCategory = 'gateway20_multi_region' }" + } + } + }, + "include": [] +} From ea8f0eba5ea5896078b5f8f7a2eae5dc0744780c Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 08:15:19 -0700 Subject: [PATCH 31/48] Document Gateway 2.0 capability bitmask (Rust=9 vs Java=11) Adds a "Capability bit composition" subsection to GATEWAY_20_SPEC.md under "SDK-supported-capabilities advertisement" that explains why the Rust driver advertises bitmask `9` while Java advertises `11`. The table breaks down each bit (PartitionMerge=1, IgnoreUnknownRntbdTokens=8, plus an additional Java-only capability at bit 1=2 left unnamed pending verification against Java source) and explicitly states that the Rust driver only advertises capabilities it implements end-to-end. Adding any new bit requires implementing the behavior first, then incrementing `SUPPORTED_CAPABILITIES_BITS` in `cosmos_headers.rs` and re-pinning the Phase 6 header-value test. This addresses one of the documented follow-ups from PR #4319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index dc09653487d..e18a4170ee8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -215,6 +215,18 @@ Phase 1 must change the emitted value to the bitmask `(PartitionMerge | IgnoreUn The `IgnoreUnknownRntbdTokens` bit is the contract that backs the silent-skip behavior in "Metadata token filtering" above: the proxy/backend uses this advertisement to decide whether it is safe to add new RNTBD tokens without coordinating with this SDK release. Advertising the bit while *also* failing or warning on unknown tokens would be a contract violation; advertising `"0"` while silently skipping unknown tokens is "merely conservative" but causes the proxy to assume zero forward-compat tolerance — both are wrong. Phase 1 must reconcile both ends. +##### Capability bit composition (Rust = `9`, Java = `11`) + +The bitmask the Rust driver advertises is **`9`** (`PartitionMerge | IgnoreUnknownRntbdTokens`). Pinned in `azure_data_cosmos_driver/src/driver/transport/cosmos_headers.rs:16-25` with a `const _: () = assert!(SUPPORTED_CAPABILITIES_BITS == 9);` invariant. The bits are sourced from .NET `SDKSupportedCapabilities.cs` and the C++ proxy enum: + +| Bit | Decimal | Capability | Rust advertises | Java advertises | Notes | +| ---- | ------- | ---------------------- | --------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| 0 | 1 | `PartitionMerge` | yes | yes | Forward-compat with merged partition-key ranges; required for Gateway 2.0 because the proxy may surface merged ranges in routing. | +| 1 | 2 | (Java-only capability; name per Java `SDKSupportedCapabilities`) | **no** | yes | Java opts in to an additional capability the Rust driver does not yet consume. Unilaterally advertising it without honoring the corresponding behavior could cause mis-framing or unexpected proxy behavior. Verify the exact capability name against Java/.NET source before adding. Track in a follow-up if/when the driver grows the corresponding support. | +| 3 | 8 | `IgnoreUnknownRntbdTokens` | yes | yes | Forward-compat with new RNTBD response tokens added by future proxy/backend versions; backed by the silent-skip behavior in "Metadata token filtering" above. | + +Total: Rust `1 | 8 = 9`; Java `1 | 2 | 8 = 11`. The two-bit gap is intentional and conservative — the Rust driver only advertises capabilities it actually implements end-to-end. Adding bit 1 (or any future bit) requires implementing the corresponding behavior first, then incrementing the constant in `cosmos_headers.rs` and re-pinning `Phase 6`'s header-value test. + Phase 6 test coverage: assert the header value emitted on Gateway 2.0 (and standard Gateway) requests is the expected bitmask string, not `"0"`. #### RNTBD Request Wire Format From 384844d17cf44c277f7f8505f6c5432e3bd40032 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 08:42:23 -0700 Subject: [PATCH 32/48] Rename gateway20 flag to negative-term name (R15) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per Gateway 2.0 spec §3, options that gate Gateway 2.0 must use negative-term names so the boolean default of `false` corresponds to the GA target state (Gateway 2.0 enabled). `is_gateway20_allowed` predated that rule. Changes: * `ConnectionPoolOptions::is_gateway20_allowed` → `gateway20_disabled` (semantics inverted) — field, getter, builder field, builder setter. * `ConnectionPoolOptionsBuilder::with_is_gateway20_allowed` → `with_gateway20_disabled` (semantics inverted). * Removed the `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` env var lookup. Spec §3 forbids env-var toggles for Gateway 2.0; getter doc-comment now explains why. * HTTP/2-disabled-forces-disabled rule preserved (`gateway20_requires_http2` test still verifies it). * Pre-GA the default stays at `true` (disabled) with an explicit TODO to flip to `false` once Slice 3d (EPK cutover), HPK partial-PK, and continuation-token format are complete. Defaulting on while the codepath is incomplete would route customer traffic through an unfinished pipeline. * All call sites in driver, transport tests, pipeline tests, the `gateway20_e2e` doc, and the perf binary updated. Booleans inverted at every `capturing_runtime(...)` and builder call. * TRANSPORT_PIPELINE_SPEC.md prose reference renamed in lock-step. * Perf binary's `gateway20_disabled` diagnostic field is hardcoded `true` for now with TODO to wire from the SDK option once that builder method lands (Phase A item 5). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../tests/emulator_tests/gateway20_e2e.rs | 2 +- .../docs/TRANSPORT_PIPELINE_SPEC.md | 2 +- .../src/driver/cosmos_driver.rs | 2 +- .../src/driver/transport/mod.rs | 8 +- .../src/options/connection_pool.rs | 92 +++++++++++-------- .../tests/gateway20_pipeline_tests.rs | 37 ++++---- sdk/cosmos/azure_data_cosmos_perf/src/main.rs | 12 ++- .../azure_data_cosmos_perf/src/runner.rs | 6 +- 8 files changed, 90 insertions(+), 71 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index a8fdcf2bfbf..13cb150815b 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -21,7 +21,7 @@ //! SDK cannot deterministically opt a client into Gateway 2.0 from outside //! the crate. //! * The driver-level toggle -//! (`ConnectionPoolOptions::with_is_gateway20_allowed`) is not wired through +//! (`ConnectionPoolOptions::with_gateway20_disabled`) is not wired through //! to `CosmosClientOptions`. //! //! Once the SDK exposes a public `with_gateway20_disabled` (or equivalent) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md index 09361ad2736..ec5782df19d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md @@ -2242,7 +2242,7 @@ HTTP/2 just uses a single `Arc` like HTTP/1.1. endpoints are detected and used. No sharding yet — stream limit may be hit under high load. > **Note on ALPN probing (§6.0):** The initial Step 5 implementation uses configuration flags -> (`is_http2_allowed`, `is_gateway20_allowed`) and `AccountProperties` metadata +> (`is_http2_allowed`, `gateway20_disabled`) and `AccountProperties` metadata > (`thinClient*Locations`) to determine the transport strategy, rather than runtime ALPN > negotiation against the gateway. This is sufficient because: > (1) reqwest with `http2` feature already performs ALPN automatically for `Http2Preferred`, diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs index 5f74982b439..25810111299 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs @@ -750,7 +750,7 @@ impl CosmosDriver { account_endpoint, default_endpoint, refresh_callback, - runtime.connection_pool().is_gateway20_allowed(), + !runtime.connection_pool().gateway20_disabled(), endpoint_unavailability_ttl, options.preferred_regions().to_vec(), )); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs index 4d158261b26..1b25c3b9bb8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs @@ -291,7 +291,7 @@ impl CosmosTransport { } match transport_mode { - TransportMode::Gateway20 if self.connection_pool.is_gateway20_allowed() => { + TransportMode::Gateway20 if !self.connection_pool.gateway20_disabled() => { let transport = match self.dataplane_gateway20_transport.get() { Some(t) => t.clone(), None => { @@ -398,7 +398,7 @@ pub(crate) mod tests { #[test] fn dataplane_transport_uses_gateway20_when_selected() { let pool = ConnectionPoolOptionsBuilder::new() - .with_is_gateway20_allowed(true) + .with_gateway20_disabled(false) .build() .unwrap(); let transport = CosmosTransport::for_tests(pool, TransportHttpVersion::Http2).unwrap(); @@ -414,7 +414,7 @@ pub(crate) mod tests { #[test] fn dataplane_transport_falls_back_to_sharded_gateway_when_endpoint_is_standard() { let pool = ConnectionPoolOptionsBuilder::new() - .with_is_gateway20_allowed(true) + .with_gateway20_disabled(false) .build() .unwrap(); let transport = CosmosTransport::for_tests(pool, TransportHttpVersion::Http2).unwrap(); @@ -430,7 +430,7 @@ pub(crate) mod tests { #[test] fn dataplane_transport_ignores_gateway20_when_gateway20_disabled() { let pool = ConnectionPoolOptionsBuilder::new() - .with_is_gateway20_allowed(false) + .with_gateway20_disabled(true) .build() .unwrap(); let transport = CosmosTransport::for_tests(pool, TransportHttpVersion::Http2).unwrap(); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs index 8bc79699cfd..0ca85de3632 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs @@ -63,7 +63,7 @@ pub struct ConnectionPoolOptions { is_http2_allowed: bool, - is_gateway20_allowed: bool, + gateway20_disabled: bool, emulator_server_cert_validation: EmulatorServerCertValidation, @@ -206,13 +206,19 @@ impl ConnectionPoolOptions { self.is_http2_allowed } - /// Returns whether Gateway 2.0 feature is allowed. + /// Returns whether Gateway 2.0 is disabled for this pool. /// - /// If `true`, the driver will use Gateway 2.0 features when communicating - /// with the Cosmos DB service (if the account supports it). Gateway 2.0 - /// requires HTTP/2, so this returns `false` if HTTP/2 is disabled. - pub fn is_gateway20_allowed(&self) -> bool { - self.is_gateway20_allowed + /// Gateway 2.0 is enabled by default; this flag is the single supported + /// disablement mechanism (per `GATEWAY_20_SPEC.md` §3, all Gateway 2.0 + /// flags use a negative-term name so that defaults mean Gateway 2.0 is + /// enabled). When `true`, the driver routes every request through the + /// standard gateway transport even when the account advertises a + /// thin-client endpoint. + /// + /// Gateway 2.0 also requires HTTP/2: when HTTP/2 is disabled, this method + /// returns `true` regardless of how the builder was configured. + pub fn gateway20_disabled(&self) -> bool { + self.gateway20_disabled } /// Returns the emulator server certificate validation setting. @@ -257,8 +263,12 @@ impl ConnectionPoolOptions { /// - `AZURE_COSMOS_CONNECTION_POOL_TCP_KEEPALIVE_INTERVAL_MS`: TCP keepalive probe interval in milliseconds (default: `1_000`, min: `1_000` when set) /// - `AZURE_COSMOS_CONNECTION_POOL_TCP_KEEPALIVE_RETRIES`: TCP keepalive retry count (default: none, min: `1`, max: `255`) /// - `AZURE_COSMOS_CONNECTION_POOL_IS_HTTP2_ALLOWED`: Whether HTTP/2 is allowed for gateway mode connections (default: `true`) -/// - `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED`: Whether Gateway 2.0 feature is allowed (default: `false`) /// - `AZURE_COSMOS_EMULATOR_SERVER_CERT_VALIDATION_DISABLED`: Whether server certificate validation is disabled for emulator; `true` maps to [`EmulatorServerCertValidation::DangerousDisabled`], `false` to [`EmulatorServerCertValidation::Enabled`] (default: `false`) +/// +/// Gateway 2.0 is intentionally **not** controlled by an environment variable +/// (see `GATEWAY_20_SPEC.md` §3): the only supported disablement mechanism is +/// the [`with_gateway20_disabled`](ConnectionPoolOptionsBuilder::with_gateway20_disabled) +/// builder method. /// - `AZURE_COSMOS_LOCAL_ADDRESS`: Local IP address to bind to (default: none) /// /// # Example @@ -296,7 +306,7 @@ pub struct ConnectionPoolOptionsBuilder { tcp_keepalive_interval: Option, tcp_keepalive_retries: Option, is_http2_allowed: Option, - is_gateway20_allowed: Option, + gateway20_disabled: Option, emulator_server_cert_validation: Option, local_address: Option, } @@ -490,9 +500,18 @@ impl ConnectionPoolOptionsBuilder { self } - /// Sets whether Gateway 2.0 feature is allowed. - pub fn with_is_gateway20_allowed(mut self, value: bool) -> Self { - self.is_gateway20_allowed = Some(value); + /// Disables Gateway 2.0 for this pool. + /// + /// Gateway 2.0 is enabled by default whenever the account advertises a + /// thin-client endpoint and HTTP/2 is allowed. Pass `true` to force every + /// request through the standard gateway transport regardless of the + /// account advertisement (operator override). + /// + /// This is the single supported disablement mechanism per + /// `GATEWAY_20_SPEC.md` §3 — there is intentionally no + /// `AZURE_COSMOS_*` environment variable that toggles Gateway 2.0. + pub fn with_gateway20_disabled(mut self, value: bool) -> Self { + self.gateway20_disabled = Some(value); self } @@ -532,25 +551,18 @@ impl ConnectionPoolOptionsBuilder { ValidationBounds::none(), )?; - let effective_is_gateway20_allowed = if let Some(gateway20) = self.is_gateway20_allowed { - gateway20 && effective_is_http2_allowed - } else { - match std::env::var("AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED") { - Ok(v) => { - let gateway20: bool = v.parse().map_err(|e| { - azure_core::Error::with_message( - azure_core::error::ErrorKind::DataConversion, - format!( - "Failed to parse AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED as boolean: {} ({})", - v, e - ), - ) - })?; - gateway20 && effective_is_http2_allowed - } - Err(_) => false, // TODO: Change to true before GA - } - }; + // Gateway 2.0 is currently disabled by default while the implementation + // is still in pre-GA. Per `GATEWAY_20_SPEC.md` §3, the field uses a + // negative-term name (`gateway20_disabled`) so that the default state + // can be flipped to "enabled" by changing only the literal below; no + // call sites or environment variables need to change. There is + // intentionally no `AZURE_COSMOS_*` env var that toggles Gateway 2.0. + // + // TODO: Change to `false` (Gateway 2.0 enabled by default) before GA. + let explicit_disabled = self.gateway20_disabled.unwrap_or(true); + // HTTP/2 is a hard prerequisite for Gateway 2.0 — when HTTP/2 is off + // the pool is effectively gateway20-disabled regardless of the flag. + let effective_gateway20_disabled = explicit_disabled || !effective_is_http2_allowed; let max_connection_pool_size_default = if effective_is_http2_allowed { 1_000 @@ -765,7 +777,7 @@ impl ConnectionPoolOptionsBuilder { tcp_keepalive_interval, tcp_keepalive_retries, is_http2_allowed: effective_is_http2_allowed, - is_gateway20_allowed: effective_is_gateway20_allowed, + gateway20_disabled: effective_gateway20_disabled, emulator_server_cert_validation: match self.emulator_server_cert_validation { Some(v) => v, None => EmulatorServerCertValidation::from(parse_from_env( @@ -822,7 +834,7 @@ mod tests { Duration::from_millis(65_000) ); assert!(options.is_http2_allowed()); - assert!(!options.is_gateway20_allowed()); + assert!(options.gateway20_disabled()); assert_eq!( options.emulator_server_cert_validation(), EmulatorServerCertValidation::Enabled @@ -879,7 +891,7 @@ mod tests { .with_tcp_keepalive_interval(Duration::from_millis(5_000)) .with_tcp_keepalive_retries(4) .with_is_http2_allowed(false) - .with_is_gateway20_allowed(true) + .with_gateway20_disabled(false) .with_emulator_server_cert_validation(EmulatorServerCertValidation::DangerousDisabled) .build() .unwrap(); @@ -942,8 +954,9 @@ mod tests { ); assert_eq!(options.tcp_keepalive_retries(), Some(4)); assert!(!options.is_http2_allowed()); - // gateway20 is set to true but HTTP/2 is false, so it should be false - assert!(!options.is_gateway20_allowed()); + // gateway20 was opted in via with_gateway20_disabled(false), but HTTP/2 is + // off, so the build forces gateway20_disabled = true. + assert!(options.gateway20_disabled()); assert_eq!( options.emulator_server_cert_validation(), EmulatorServerCertValidation::DangerousDisabled @@ -1215,12 +1228,13 @@ mod tests { fn gateway20_requires_http2() { let options = ConnectionPoolOptionsBuilder::new() .with_is_http2_allowed(false) - .with_is_gateway20_allowed(true) + .with_gateway20_disabled(false) .build() .unwrap(); - // Gateway 2.0 should be disabled if HTTP/2 is not allowed - assert!(!options.is_gateway20_allowed()); + // Gateway 2.0 must be reported as disabled if HTTP/2 is not allowed, + // even when the operator explicitly opted in via with_gateway20_disabled(false). + assert!(options.gateway20_disabled()); } #[test] diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs index 19a550f5869..f67b1d038e5 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs @@ -12,7 +12,7 @@ //! //! 1. **Operator override** — the operator can opt out of Gateway 2.0 even when //! the account advertises a thin-client endpoint. Verified via the public -//! [`ConnectionPoolOptions::with_is_gateway20_allowed`] toggle. +//! [`ConnectionPoolOptions::with_gateway20_disabled`] toggle. //! //! 2. **Operation eligibility** — operations that Gateway 2.0 does not yet //! support (e.g., stored procedure execution) must transparently fall back @@ -151,14 +151,15 @@ fn live_account_from_env() -> Option { } /// Builds a runtime with the capturing factory and the requested -/// gateway-20 toggle. The two flags reflect the operator override exposed via -/// `ConnectionPoolOptions`. +/// gateway-20 toggle. The flag reflects the operator override exposed via +/// `ConnectionPoolOptions` — passing `true` forces every request through the +/// standard gateway transport. async fn capturing_runtime( - is_gateway20_allowed: bool, + gateway20_disabled: bool, ) -> (Arc, Arc) { let (factory, transport) = CapturingFactory::new(); let pool = ConnectionPoolOptions::builder() - .with_is_gateway20_allowed(is_gateway20_allowed) + .with_gateway20_disabled(gateway20_disabled) .build() .expect("connection pool builds"); let runtime = CosmosDriverRuntime::builder() @@ -184,34 +185,34 @@ async fn probe(runtime: &Arc) { // (a) Operator override forces standard gateway routing // ---------------------------------------------------------------------------- -/// Verifies that the operator override flag (`with_is_gateway20_allowed(false)`) -/// is honored end-to-end at the connection-pool level. When the flag is off, +/// Verifies that the operator override flag (`with_gateway20_disabled(true)`) +/// is honored end-to-end at the connection-pool level. When the flag is set, /// the runtime must not select the Gateway 2.0 transport even if account /// metadata advertises a thin-client endpoint. /// /// We assert the contract structurally via `ConnectionPoolOptions`: when the -/// flag is `false`, `is_gateway20_allowed()` reports `false`, and the +/// flag is `true`, `gateway20_disabled()` reports `true`, and the /// transport-layer dispatcher branches to the standard gateway (this branching /// is covered by the inside-crate tests in /// `driver::transport::tests::dataplane_transport_*`). #[tokio::test] async fn operator_override_disables_gateway20_at_pool_level() { let off = ConnectionPoolOptions::builder() - .with_is_gateway20_allowed(false) + .with_gateway20_disabled(true) .build() .expect("pool builds"); assert!( - !off.is_gateway20_allowed(), - "operator-disabled pool must report is_gateway20_allowed = false" + off.gateway20_disabled(), + "operator-disabled pool must report gateway20_disabled = true" ); let on = ConnectionPoolOptions::builder() - .with_is_gateway20_allowed(true) + .with_gateway20_disabled(false) .build() .expect("pool builds"); assert!( - on.is_gateway20_allowed(), - "operator-enabled pool must report is_gateway20_allowed = true" + !on.gateway20_disabled(), + "operator-enabled pool must report gateway20_disabled = false" ); } @@ -229,7 +230,7 @@ async fn operator_override_routes_reads_to_standard_gateway() { // TODO(Phase 6): once diagnostics expose `TransportKind` per request, // assert that every request used `TransportKind::StandardGateway`. let pool = ConnectionPoolOptions::builder() - .with_is_gateway20_allowed(false) + .with_gateway20_disabled(true) .build() .expect("pool builds"); let runtime = CosmosDriverRuntime::builder() @@ -319,7 +320,7 @@ async fn v1_http_never_emits_both_consistency_headers() { const LEGACY: &str = "x-ms-consistency-level"; const STRATEGY: &str = "x-ms-cosmos-read-consistency-strategy"; - let (runtime, transport) = capturing_runtime(false).await; + let (runtime, transport) = capturing_runtime(true).await; probe(&runtime).await; let captured = transport.requests(); @@ -371,7 +372,7 @@ async fn v1_http_never_emits_both_consistency_headers() { /// the V2 transport boundary moves. #[tokio::test] async fn v2_rntbd_never_emits_both_consistency_tokens() { - let (runtime, transport) = capturing_runtime(true).await; + let (runtime, transport) = capturing_runtime(false).await; probe(&runtime).await; let captured = transport.requests(); @@ -421,7 +422,7 @@ async fn v2_rntbd_never_emits_both_consistency_tokens() { async fn capabilities_header_value_is_pinned_to_nine() { const CAPABILITIES: &str = "x-ms-cosmos-sdk-supportedcapabilities"; - let (runtime, transport) = capturing_runtime(true).await; + let (runtime, transport) = capturing_runtime(false).await; probe(&runtime).await; let captured = transport.requests(); diff --git a/sdk/cosmos/azure_data_cosmos_perf/src/main.rs b/sdk/cosmos/azure_data_cosmos_perf/src/main.rs index 719f52d4cb2..eb4a284c29d 100644 --- a/sdk/cosmos/azure_data_cosmos_perf/src/main.rs +++ b/sdk/cosmos/azure_data_cosmos_perf/src/main.rs @@ -282,10 +282,14 @@ async fn main() -> Result<(), Box> { .ok() .and_then(|v| v.parse::().ok()) .unwrap_or(true), - gateway20_allowed: std::env::var("AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED") - .ok() - .and_then(|v| v.parse::().ok()) - .unwrap_or(false), + // Gateway 2.0 is intentionally not toggled via env var (see + // GATEWAY_20_SPEC.md §3). Until the perf binary wires through the + // public SDK toggle (`CosmosClientOptions::with_gateway20_disabled`), + // it inherits whatever default the SDK ships with — currently + // disabled (pre-GA). + // TODO: Read the actual configured value from the SDK once the + // public toggle lands. + gateway20_disabled: true, pyroscope_enabled: std::env::var("PYROSCOPE_SERVER_URL") .map(|v| !v.is_empty()) .unwrap_or(false), diff --git a/sdk/cosmos/azure_data_cosmos_perf/src/runner.rs b/sdk/cosmos/azure_data_cosmos_perf/src/runner.rs index fdda04c78f8..ab03cb87191 100644 --- a/sdk/cosmos/azure_data_cosmos_perf/src/runner.rs +++ b/sdk/cosmos/azure_data_cosmos_perf/src/runner.rs @@ -77,7 +77,7 @@ struct PerfResult { #[serde(skip_serializing_if = "Option::is_none")] config_ppcb_enabled: Option, #[serde(skip_serializing_if = "Option::is_none")] - config_gateway20_allowed: Option, + config_gateway20_disabled: Option, #[serde(skip_serializing_if = "Option::is_none")] config_pyroscope_enabled: Option, #[serde(skip_serializing_if = "Option::is_none")] @@ -135,7 +135,7 @@ pub struct ConfigSnapshot { pub excluded_regions: String, pub tokio_threads: u64, pub ppcb_enabled: bool, - pub gateway20_allowed: bool, + pub gateway20_disabled: bool, pub pyroscope_enabled: bool, pub tokio_console_enabled: bool, pub tokio_metrics_enabled: bool, @@ -392,7 +392,7 @@ async fn upsert_results( }, config_tokio_threads: Some(config.tokio_threads), config_ppcb_enabled: Some(config.ppcb_enabled), - config_gateway20_allowed: Some(config.gateway20_allowed), + config_gateway20_disabled: Some(config.gateway20_disabled), config_pyroscope_enabled: Some(config.pyroscope_enabled), config_tokio_console_enabled: Some(config.tokio_console_enabled), config_tokio_metrics_enabled: Some(config.tokio_metrics_enabled), From 76f8f4ff5ca8dbebbd15adda6cc316485a45a93b Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 08:42:33 -0700 Subject: [PATCH 33/48] Remove THINCLIENT_PROXY_* deprecated SDK aliases MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These were re-exports of the canonical driver-level `GATEWAY20_OPERATION_TYPE` / `GATEWAY20_RESOURCE_TYPE` constants, marked `#[deprecated(since = "0.33.0")]`. The Gateway 2.0 work is pre-GA and there are no released callers depending on the old identifier — back-compat aliases are not needed and only add API surface that we'd have to delete later. Removes: * `THINCLIENT_PROXY_OPERATION_TYPE` * `THINCLIENT_PROXY_RESOURCE_TYPE` The `COSMOS_ALLOWED_HEADERS` macro continues to reference the canonical driver constants directly (unchanged), so the headers stay on the logging allowlist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos/src/constants.rs | 20 ------------------- 1 file changed, 20 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/src/constants.rs b/sdk/cosmos/azure_data_cosmos/src/constants.rs index d4ddddd0283..db5f85ebb8c 100644 --- a/sdk/cosmos/azure_data_cosmos/src/constants.rs +++ b/sdk/cosmos/azure_data_cosmos/src/constants.rs @@ -198,26 +198,6 @@ cosmos_headers! { FAULT_INJECTION_CONTAINER_ID => "x-ms-fault-injection-container-id", } -/// Deprecated alias for the Gateway 2.0 proxy operation-type header. -/// -/// Use the driver-level `GATEWAY20_OPERATION_TYPE` constant for new code. -#[deprecated( - since = "0.33.0", - note = "Use `azure_data_cosmos_driver::constants::GATEWAY20_OPERATION_TYPE` instead." -)] -pub const THINCLIENT_PROXY_OPERATION_TYPE: HeaderName = - azure_data_cosmos_driver::constants::GATEWAY20_OPERATION_TYPE; - -/// Deprecated alias for the Gateway 2.0 proxy resource-type header. -/// -/// Use the driver-level `GATEWAY20_RESOURCE_TYPE` constant for new code. -#[deprecated( - since = "0.33.0", - note = "Use `azure_data_cosmos_driver::constants::GATEWAY20_RESOURCE_TYPE` instead." -)] -pub const THINCLIENT_PROXY_RESOURCE_TYPE: HeaderName = - azure_data_cosmos_driver::constants::GATEWAY20_RESOURCE_TYPE; - pub const QUERY_CONTENT_TYPE: ContentType = ContentType::from_static("application/query+json"); pub(crate) const PREFER_MINIMAL: HeaderValue = HeaderValue::from_static("return=minimal"); From 953fabf31b88d3f913cdc7eef4e60f91e9fa3c33 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 08:54:10 -0700 Subject: [PATCH 34/48] Add transport_kind filter to FaultInjectionCondition MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds an optional `TransportKind` filter to fault injection rules so that callers can scope a rule to Gateway 1.x or Gateway 2.0 traffic without affecting metadata clients or the other dataplane transport. Driver: * `FaultInjectionCondition` gains a `transport_kind: Option` field plus a `with_transport_kind` builder method. * `FaultInjectionEvaluation` gains a `TransportKindMismatch` variant emitted when a rule restricts to a transport that this client does not serve. * `FaultClient` now carries the bound transport kind (set at construction by `FaultInjectingHttpClientFactory` from `HttpClientConfig`). Metadata clients have `transport_kind == None` and so never match a rule that requires a specific transport — this prevents a Gateway-2.0 rule from accidentally firing on account discovery or other metadata traffic. * `HttpClientConfig` records the transport kind for each constructor: metadata = None, dataplane gateway = Some(Gateway), dataplane Gateway 2.0 = Some(Gateway20). * Three new unit tests in `fault_injection::http_client::tests` cover the match / mismatch / metadata-client cases. SDK: * `FaultInjectionCondition` mirrors the new field and builder method. * `fault_injection::TransportKind` is re-exported from the driver so SDK consumers do not need to depend on the driver crate. * `driver_bridge` translates the SDK-side `transport_kind` through to the driver builder. Tests: * The three Gateway 2.0 fault-injection tests in `driver_fault_injection.rs` now scope their rules to `TransportKind::Gateway20` and are gated behind the `gateway20` test category — the rule semantics are now correct, and the tests no longer fire spuriously on emulator account-discovery traffic. * The matching SDK-side test in `cosmos_fault_injection.rs` is similarly scoped and gated. Once the public SDK Gateway 2.0 toggle lands, the assertion will flip from 'rule never fires' to 'read succeeds via the standard-gateway fallback'. * `azure_data_cosmos_driver/build.rs` declares the new `gateway20` test_category value (the SDK `build.rs` already declared it). Resolves the `with_transport_kind` follow-up from PR #4319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos/src/driver_bridge.rs | 6 + .../src/fault_injection/condition.rs | 30 +++- .../src/fault_injection/mod.rs | 6 + .../emulator_tests/cosmos_fault_injection.rs | 45 +++-- sdk/cosmos/azure_data_cosmos_driver/build.rs | 4 +- .../driver/transport/http_client_factory.rs | 12 +- .../src/fault_injection/condition.rs | 36 +++- .../src/fault_injection/evaluation.rs | 10 ++ .../fault_injecting_factory.rs | 7 +- .../src/fault_injection/http_client.rs | 163 +++++++++++++++--- .../emulator_tests/driver_fault_injection.rs | 42 ++--- 11 files changed, 288 insertions(+), 73 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs b/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs index aecc399a867..14e9137a8b5 100644 --- a/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs +++ b/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs @@ -142,6 +142,12 @@ pub(crate) fn sdk_fi_rules_to_driver_fi_rules( if let Some(container_id) = &sdk_rule.condition.container_id { cond_builder = cond_builder.with_container_id(container_id.clone()); } + if let Some(transport_kind) = sdk_rule.condition.transport_kind { + // SDK and driver share the same `TransportKind` type + // (the SDK re-exports it from the driver), so no enum + // conversion is required. + cond_builder = cond_builder.with_transport_kind(transport_kind); + } // Translate result let mut result_builder = DriverResultBuilder::new(); diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs index 1201497c7d4..0dff61ff2de 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs @@ -3,7 +3,7 @@ //! Defines conditions for when fault injection rules should be applied. -use super::FaultOperationType; +use super::{FaultOperationType, TransportKind}; use crate::regions::Region; /// Defines the condition under which a fault injection rule should be applied. @@ -15,6 +15,12 @@ pub struct FaultInjectionCondition { pub region: Option, /// The container ID to which the fault injection applies. pub container_id: Option, + /// Restricts the rule to a specific transport kind (Gateway 1.x vs + /// Gateway 2.0). When `None`, the rule applies regardless of which + /// dataplane transport carries the request. When `Some`, the rule + /// only applies to clients bound to that transport — metadata + /// clients always skip the rule. + pub transport_kind: Option, } /// Builder for creating a FaultInjectionCondition. @@ -23,6 +29,7 @@ pub struct FaultInjectionConditionBuilder { operation_type: Option, region: Option, container_id: Option, + transport_kind: Option, } impl FaultInjectionConditionBuilder { @@ -32,6 +39,7 @@ impl FaultInjectionConditionBuilder { operation_type: None, region: None, container_id: None, + transport_kind: None, } } @@ -53,19 +61,28 @@ impl FaultInjectionConditionBuilder { self } + /// Restricts the rule to a specific transport kind (e.g., + /// [`TransportKind::Gateway20`]). When set, the rule only matches + /// requests carried by the matching transport. + pub fn with_transport_kind(mut self, transport_kind: TransportKind) -> Self { + self.transport_kind = Some(transport_kind); + self + } + /// Builds the FaultInjectionCondition. pub fn build(self) -> FaultInjectionCondition { FaultInjectionCondition { operation_type: self.operation_type, region: self.region, container_id: self.container_id, + transport_kind: self.transport_kind, } } } #[cfg(test)] mod tests { - use super::FaultInjectionConditionBuilder; + use super::{FaultInjectionConditionBuilder, TransportKind}; #[test] fn builder_default() { @@ -74,5 +91,14 @@ mod tests { assert!(condition.operation_type.is_none()); assert!(condition.region.is_none()); assert!(condition.container_id.is_none()); + assert!(condition.transport_kind.is_none()); + } + + #[test] + fn with_transport_kind_sets_field() { + let condition = FaultInjectionConditionBuilder::new() + .with_transport_kind(TransportKind::Gateway20) + .build(); + assert_eq!(condition.transport_kind, Some(TransportKind::Gateway20)); } } diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs index f0cee5374be..cec36071201 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs @@ -113,6 +113,12 @@ pub use result::{ }; pub use rule::{FaultInjectionRule, FaultInjectionRuleBuilder}; +/// Re-export of the driver's [`TransportKind`](azure_data_cosmos_driver::diagnostics::TransportKind) +/// enum so SDK consumers can scope fault-injection rules to a specific +/// transport (Gateway 1.x vs Gateway 2.0) without depending on the +/// driver crate directly. +pub use azure_data_cosmos_driver::diagnostics::TransportKind; + /// Represents different server error types that can be injected for fault testing. #[derive(Clone, Copy, Debug, PartialEq, Eq)] pub enum FaultInjectionErrorType { diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs index fcfd2f7f848..312ad8ba907 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs @@ -12,7 +12,7 @@ use super::framework; use azure_core::{http::StatusCode, Uuid}; use azure_data_cosmos::fault_injection::{ FaultInjectionClientBuilder, FaultInjectionConditionBuilder, FaultInjectionErrorType, - FaultInjectionResultBuilder, FaultInjectionRuleBuilder, FaultOperationType, + FaultInjectionResultBuilder, FaultInjectionRuleBuilder, FaultOperationType, TransportKind, }; use azure_data_cosmos::models::{ContainerProperties, ThroughputProperties}; use framework::{get_effective_hub_endpoint, TestClient, TestOptions}; @@ -1005,20 +1005,20 @@ pub async fn fault_injection_enable_disable_rule() -> Result<(), Box> /// transparently — the client must not surface the connection failure to the /// caller when a usable fallback transport exists. /// -/// **Limitations**: -/// * `FaultInjectionCondition` does not yet expose a per-transport-kind -/// filter, so the rule fires on whichever transport is selected at dispatch -/// time. Today the SDK does not expose a public Gateway 2.0 enable API -/// (see `CosmosClientOptions`), so the SDK currently never selects the -/// Gateway 2.0 transport — this test runs in standard-gateway mode only. -/// * Once `CosmosClientOptions` exposes a Gateway 2.0 toggle and -/// `FaultInjectionCondition` supports `with_transport_kind`, scope this -/// rule to `TransportKind::Gateway20` and assert the request **succeeds** -/// with the standard-gateway fallback recorded in diagnostics. +/// The rule is scoped to [`TransportKind::Gateway20`] via +/// `with_transport_kind`, so it only fires on Gateway 2.0 traffic and never +/// on standard-gateway requests. +/// +/// **Limitation**: the SDK does not yet expose a public Gateway 2.0 enable +/// API on `CosmosClientOptions`, so the SDK currently never selects the +/// Gateway 2.0 transport. Until that toggle lands, this test is gated behind +/// the `gateway20` test category. Once the SDK toggle ships, the assertion +/// should change from "rule never fires" to "read SUCCEEDS via the +/// standard-gateway fallback". #[tokio::test] #[cfg_attr( - not(test_category = "emulator"), - ignore = "requires test_category 'emulator'" + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20'" )] pub async fn gateway20_connection_error_falls_back_to_standard_gateway( ) -> Result<(), Box> { @@ -1029,6 +1029,7 @@ pub async fn gateway20_connection_error_falls_back_to_standard_gateway( let condition = FaultInjectionConditionBuilder::new() .with_operation_type(FaultOperationType::ReadItem) + .with_transport_kind(TransportKind::Gateway20) .build(); let rule = FaultInjectionRuleBuilder::new("gateway20-conn-error-fallback", server_error) @@ -1061,21 +1062,17 @@ pub async fn gateway20_connection_error_falls_back_to_standard_gateway( let fault_db_client = fault_client.database_client(db_client.id()); let fault_container_client = fault_db_client.container_client(&container_id).await?; - // Today the rule fires on the standard gateway path (the SDK does - // not yet route through Gateway 2.0). The read should fail because - // there is no further fallback below the standard gateway. - // - // TODO(Phase 6): once the SDK exposes a public Gateway 2.0 - // enable API and `FaultInjectionCondition` supports a - // per-transport-kind filter, scope the rule to - // `TransportKind::Gateway20` and assert this read SUCCEEDS via - // the standard-gateway fallback. + // Once the SDK exposes a public Gateway 2.0 enable API, this read + // should SUCCEED via the standard-gateway fallback (the rule + // fires only on Gateway 2.0, leaving the fallback transport + // untouched). let result = fault_container_client .read_item::(&pk, &item_id, None) .await; assert!( - result.is_err(), - "Today the read should fail; once Gateway 2.0 fallback lands, this should succeed" + result.is_ok(), + "Read should succeed via the standard-gateway fallback when \ + the rule is scoped to Gateway 2.0" ); Ok(()) diff --git a/sdk/cosmos/azure_data_cosmos_driver/build.rs b/sdk/cosmos/azure_data_cosmos_driver/build.rs index 03429fb6dca..a631bbe5814 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/build.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/build.rs @@ -6,5 +6,7 @@ // unknown cfg names are warned/denied unless explicitly declared via check-cfg. fn main() { // Allow `#[cfg_attr(not(test_category = "..."), ignore)]` in `tests/*.rs`. - println!("cargo:rustc-check-cfg=cfg(test_category, values(\"emulator\", \"multi_write\"))"); + println!( + "cargo:rustc-check-cfg=cfg(test_category, values(\"emulator\", \"multi_write\", \"gateway20\"))" + ); } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/http_client_factory.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/http_client_factory.rs index 40f893dc481..30b364a1b23 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/http_client_factory.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/http_client_factory.rs @@ -7,7 +7,7 @@ use std::{fmt, sync::Arc}; use super::cosmos_transport_client::TransportClient; -use crate::diagnostics::TransportHttpVersion; +use crate::diagnostics::{TransportHttpVersion, TransportKind}; use crate::options::ConnectionPoolOptions; /// HTTP protocol policy required by a transport. @@ -26,6 +26,13 @@ pub struct HttpClientConfig { pub(crate) request_timeout: std::time::Duration, pub(crate) allow_invalid_cert: bool, pub(crate) http2_keep_alive_while_idle: bool, + /// The transport kind this HTTP client serves, when it is bound to a + /// dataplane transport. Metadata clients (account discovery, etc.) leave + /// this `None` because they are not gateway/Gateway-2.0-specific. + /// + /// This is consumed by the fault-injection layer so rules can scope + /// themselves to a specific transport (`with_transport_kind`). + pub(crate) transport_kind: Option, } impl HttpClientConfig { @@ -42,6 +49,7 @@ impl HttpClientConfig { request_timeout: connection_pool.max_metadata_request_timeout(), allow_invalid_cert: false, http2_keep_alive_while_idle: negotiated_version.is_http2(), + transport_kind: None, } } @@ -58,6 +66,7 @@ impl HttpClientConfig { request_timeout: connection_pool.max_dataplane_request_timeout(), allow_invalid_cert: false, http2_keep_alive_while_idle: negotiated_version.is_http2(), + transport_kind: Some(TransportKind::Gateway), } } @@ -68,6 +77,7 @@ impl HttpClientConfig { request_timeout: connection_pool.max_dataplane_request_timeout(), allow_invalid_cert: false, http2_keep_alive_while_idle: true, + transport_kind: Some(TransportKind::Gateway20), } } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs index 114383144c0..1ecfc3b3eaa 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs @@ -4,6 +4,7 @@ //! Defines conditions for when fault injection rules should be applied. use super::FaultOperationType; +use crate::diagnostics::TransportKind; use crate::options::Region; /// Defines the condition under which a fault injection rule should be applied. @@ -13,6 +14,7 @@ pub struct FaultInjectionCondition { operation_type: Option, region: Option, container_id: Option, + transport_kind: Option, } impl FaultInjectionCondition { @@ -30,6 +32,15 @@ impl FaultInjectionCondition { pub fn container_id(&self) -> Option<&str> { self.container_id.as_deref() } + + /// Returns the transport kind to which the fault injection applies. + /// + /// When `Some`, the rule only matches requests that travelled through the + /// specified transport (e.g. `TransportKind::Gateway20`). When `None`, the + /// rule matches every transport (including metadata, gateway, and Gateway 2.0). + pub fn transport_kind(&self) -> Option { + self.transport_kind + } } /// Builder for creating a FaultInjectionCondition. @@ -38,6 +49,7 @@ pub struct FaultInjectionConditionBuilder { operation_type: Option, region: Option, container_id: Option, + transport_kind: Option, } impl FaultInjectionConditionBuilder { @@ -47,6 +59,7 @@ impl FaultInjectionConditionBuilder { operation_type: None, region: None, container_id: None, + transport_kind: None, } } @@ -68,12 +81,24 @@ impl FaultInjectionConditionBuilder { self } + /// Restricts the rule to a specific transport kind. + /// + /// Use this to scope a fault to (for example) only Gateway 2.0 traffic + /// (`TransportKind::Gateway20`) while leaving the standard gateway path + /// untouched. When unset, the rule applies regardless of which transport + /// carried the request. + pub fn with_transport_kind(mut self, transport_kind: TransportKind) -> Self { + self.transport_kind = Some(transport_kind); + self + } + /// Builds the FaultInjectionCondition. pub fn build(self) -> FaultInjectionCondition { FaultInjectionCondition { operation_type: self.operation_type, region: self.region, container_id: self.container_id, + transport_kind: self.transport_kind, } } } @@ -81,6 +106,7 @@ impl FaultInjectionConditionBuilder { #[cfg(test)] mod tests { use super::FaultInjectionConditionBuilder; + use crate::diagnostics::TransportKind; #[test] fn builder_default() { @@ -88,6 +114,14 @@ mod tests { let condition = builder.build(); assert!(condition.operation_type().is_none()); assert!(condition.region().is_none()); - assert!(condition.container_id().is_none()); + assert!(condition.transport_kind().is_none()); + } + + #[test] + fn with_transport_kind_round_trip() { + let condition = FaultInjectionConditionBuilder::new() + .with_transport_kind(TransportKind::Gateway20) + .build(); + assert_eq!(condition.transport_kind(), Some(TransportKind::Gateway20)); } } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/evaluation.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/evaluation.rs index 09f6aaed62e..5dd147e6dbc 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/evaluation.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/evaluation.rs @@ -61,6 +61,12 @@ pub enum FaultInjectionEvaluation { /// The ID of the rule. rule_id: String, }, + /// Rule was skipped because the request was not carried by the transport + /// kind that the rule restricts itself to. + TransportKindMismatch { + /// The ID of the rule. + rule_id: String, + }, /// Rule matched but was superseded by a higher-priority rule (first-match-wins). Superseded { /// The ID of the superseded rule. @@ -87,6 +93,7 @@ impl FaultInjectionEvaluation { | Self::OperationMismatch { rule_id } | Self::RegionMismatch { rule_id } | Self::ContainerMismatch { rule_id } + | Self::TransportKindMismatch { rule_id } | Self::Superseded { rule_id } => rule_id, } } @@ -136,6 +143,9 @@ impl std::fmt::Display for FaultInjectionEvaluation { Self::ContainerMismatch { rule_id } => { write!(f, "rule '{rule_id}': skipped (container mismatch)") } + Self::TransportKindMismatch { rule_id } => { + write!(f, "rule '{rule_id}': skipped (transport kind mismatch)") + } Self::Superseded { rule_id } => { write!( f, diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/fault_injecting_factory.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/fault_injecting_factory.rs index ab94ac509bf..5b9153692b3 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/fault_injecting_factory.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/fault_injecting_factory.rs @@ -41,9 +41,14 @@ impl HttpClientFactory for FaultInjectingHttpClientFactory { connection_pool: &ConnectionPoolOptions, config: HttpClientConfig, ) -> azure_core::Result> { + let transport_kind = config.transport_kind; let real_client = self.inner.build(connection_pool, config)?; let rules = (*self.rules).clone(); - Ok(Arc::new(FaultClient::new(real_client, rules))) + Ok(Arc::new(FaultClient::new( + real_client, + rules, + transport_kind, + ))) } } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/http_client.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/http_client.rs index 0a3d832309a..279da495f17 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/http_client.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/http_client.rs @@ -10,7 +10,7 @@ use super::rule::FaultInjectionRule; use super::FaultInjectionErrorType; use super::FaultInjectionEvaluation; use super::FaultOperationType; -use crate::diagnostics::RequestSentStatus; +use crate::diagnostics::{RequestSentStatus, TransportKind}; use crate::driver::transport::cosmos_transport_client::{ HttpRequest, HttpResponse, TransportClient, TransportError, }; @@ -42,6 +42,10 @@ pub struct FaultClient { inner: Arc, /// The fault injection rules to apply. rules: Arc>>, + /// The transport kind this client serves, when bound to a dataplane + /// transport. `None` for metadata clients (account discovery and + /// similar) where the gateway-vs-Gateway-2.0 distinction does not apply. + transport_kind: Option, } impl FaultClient { @@ -49,10 +53,12 @@ impl FaultClient { pub(crate) fn new( inner: Arc, rules: Vec>, + transport_kind: Option, ) -> Self { Self { inner, rules: Arc::new(rules), + transport_kind, } } @@ -137,6 +143,18 @@ impl FaultClient { } } + if let Some(expected_kind) = condition.transport_kind() { + // The rule restricts itself to a specific transport. If this + // FaultClient is bound to a different transport (or to a + // metadata client with no transport kind at all), the rule + // does not apply. + if self.transport_kind != Some(expected_kind) { + return Some(FaultInjectionEvaluation::TransportKindMismatch { + rule_id: rule.id().to_owned(), + }); + } + } + None // Condition matches } @@ -377,6 +395,7 @@ impl TransportClient for FaultClient { #[cfg(test)] mod tests { use super::FaultClient; + use crate::diagnostics::TransportKind; use crate::driver::transport::cosmos_transport_client::{ HttpRequest, HttpResponse, TransportClient, TransportError, }; @@ -459,7 +478,7 @@ mod tests { .with_condition(condition) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); // Request without operation type header shouldn't match let (request, _collector) = create_test_request(); @@ -472,7 +491,7 @@ mod tests { #[tokio::test] async fn execute_request_empty_rules() { let mock_client = Arc::new(MockTransportClient::new()); - let fault_client = FaultClient::new(mock_client.clone(), vec![]); + let fault_client = FaultClient::new(mock_client.clone(), vec![], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -492,7 +511,7 @@ mod tests { .with_hit_limit(2) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); // First two requests should hit the fault @@ -519,7 +538,7 @@ mod tests { .with_start_time(Instant::now() + Duration::from_secs(60)) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); // Request should pass through because start_time is in the future @@ -537,7 +556,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("error-rule", error).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -562,7 +581,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("throttle-rule", error).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -586,7 +605,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("response-delay-rule", error).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); // Delay-only should pass through to actual request after delay @@ -619,7 +638,7 @@ mod tests { .with_condition(condition) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); // Request URL doesn't contain "westus", should pass through let (request, _collector) = create_test_request(); @@ -643,7 +662,7 @@ mod tests { .with_condition(condition) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); // Request URL doesn't contain "my-container", should pass through let (request, _collector) = create_test_request(); @@ -665,7 +684,7 @@ mod tests { .with_hit_limit(2) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); // First request should hit the fault @@ -726,7 +745,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("substatus-rule", error).build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -783,7 +802,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("conn-error", error).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -807,7 +826,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("timeout-error", error).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let result = fault_client.send(&request).await; @@ -836,7 +855,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("custom-response-rule", result).build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (request, _collector) = create_test_request(); let response = fault_client.send(&request).await; @@ -862,7 +881,7 @@ mod tests { .with_condition(condition) .build(); - let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); let (mut request, _collector) = create_test_request(); request @@ -890,7 +909,7 @@ mod tests { .build(); let rule = FaultInjectionRuleBuilder::new("header-test-rule", result).build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); let (request, collector) = create_test_request(); let response = fault_client.send(&request).await; @@ -911,7 +930,7 @@ mod tests { let rule = Arc::new(FaultInjectionRuleBuilder::new("disabled-rule", error).build()); rule.disable(); - let fault_client = FaultClient::new(mock_client, vec![rule]); + let fault_client = FaultClient::new(mock_client, vec![rule], None); let (request, collector) = create_test_request(); let result = fault_client.send(&request).await; assert!(result.is_ok(), "Request should succeed with disabled rule"); @@ -937,7 +956,7 @@ mod tests { .with_error(FaultInjectionErrorType::ServiceUnavailable) .build(); let rule = FaultInjectionRuleBuilder::new("test-rule", error).build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); let (request, collector) = create_test_request(); let _ = fault_client.send(&request).await; @@ -955,7 +974,7 @@ mod tests { .with_error(FaultInjectionErrorType::ConnectionError) .build(); let rule = FaultInjectionRuleBuilder::new("conn-rule", error).build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); let (request, collector) = create_test_request(); let _ = fault_client.send(&request).await; @@ -973,7 +992,7 @@ mod tests { .with_error(FaultInjectionErrorType::ResponseTimeout) .build(); let rule = FaultInjectionRuleBuilder::new("timeout-rule", error).build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); let (request, collector) = create_test_request(); let _ = fault_client.send(&request).await; @@ -1003,7 +1022,7 @@ mod tests { let rule2 = Arc::new(FaultInjectionRuleBuilder::new("active-rule", error2).build()); let rule3 = Arc::new(FaultInjectionRuleBuilder::new("superseded-rule", error3).build()); - let fault_client = FaultClient::new(mock_client, vec![rule1, rule2, rule3]); + let fault_client = FaultClient::new(mock_client, vec![rule1, rule2, rule3], None); let (request, collector) = create_test_request(); let _ = fault_client.send(&request).await; @@ -1039,7 +1058,7 @@ mod tests { .with_condition(condition) .build(); - let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)]); + let fault_client = FaultClient::new(mock_client, vec![Arc::new(rule)], None); // Request without matching operation header let (request, collector) = create_test_request(); @@ -1052,4 +1071,102 @@ mod tests { super::FaultInjectionEvaluation::OperationMismatch { rule_id } if rule_id == "no-match-rule" )); } + + #[tokio::test] + async fn transport_kind_filter_skips_when_kind_does_not_match() { + let mock_client = Arc::new(MockTransportClient::new()); + + // Rule scoped to Gateway 2.0 only. + let condition = FaultInjectionConditionBuilder::new() + .with_transport_kind(TransportKind::Gateway20) + .build(); + let error = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ServiceUnavailable) + .build(); + let rule = FaultInjectionRuleBuilder::new("gw20-only", error) + .with_condition(condition) + .build(); + + // Bind the FaultClient to a non-Gateway-2.0 transport — the rule + // must be skipped and the request must reach the inner client. + let fault_client = FaultClient::new( + mock_client.clone(), + vec![Arc::new(rule)], + Some(TransportKind::Gateway), + ); + + let (request, collector) = create_test_request(); + let result = fault_client.send(&request).await; + + assert!(result.is_ok()); + assert_eq!(mock_client.call_count(), 1); + + let evals = collector.take(); + assert_eq!(evals.len(), 1); + assert!(matches!( + &evals[0], + super::FaultInjectionEvaluation::TransportKindMismatch { rule_id } if rule_id == "gw20-only" + )); + } + + #[tokio::test] + async fn transport_kind_filter_applies_when_kind_matches() { + let mock_client = Arc::new(MockTransportClient::new()); + + let condition = FaultInjectionConditionBuilder::new() + .with_transport_kind(TransportKind::Gateway20) + .build(); + let error = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ServiceUnavailable) + .build(); + let rule = FaultInjectionRuleBuilder::new("gw20-only", error) + .with_condition(condition) + .build(); + + // Bind to a Gateway 2.0 transport — the rule must apply and the + // injected error must surface to the caller. + let fault_client = FaultClient::new( + mock_client.clone(), + vec![Arc::new(rule)], + Some(TransportKind::Gateway20), + ); + + let (request, _collector) = create_test_request(); + let result = fault_client.send(&request).await; + + assert!(result.is_err()); + // Inner client must NOT have been called when a fault is injected. + assert_eq!(mock_client.call_count(), 0); + } + + #[tokio::test] + async fn transport_kind_filter_skips_metadata_clients() { + let mock_client = Arc::new(MockTransportClient::new()); + + let condition = FaultInjectionConditionBuilder::new() + .with_transport_kind(TransportKind::Gateway20) + .build(); + let error = FaultInjectionResultBuilder::new() + .with_error(FaultInjectionErrorType::ServiceUnavailable) + .build(); + let rule = FaultInjectionRuleBuilder::new("gw20-only", error) + .with_condition(condition) + .build(); + + // Metadata clients have transport_kind = None. A rule that + // requires a specific transport must never apply to metadata. + let fault_client = FaultClient::new(mock_client.clone(), vec![Arc::new(rule)], None); + + let (request, collector) = create_test_request(); + let result = fault_client.send(&request).await; + + assert!(result.is_ok()); + assert_eq!(mock_client.call_count(), 1); + + let evals = collector.take(); + assert!(matches!( + evals.as_slice(), + [super::FaultInjectionEvaluation::TransportKindMismatch { rule_id }] if rule_id == "gw20-only" + )); + } } diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs index 2b89c026ce6..ebc1441f001 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs @@ -6,6 +6,7 @@ #![cfg(feature = "fault_injection")] use crate::framework::DriverTestClient; +use azure_data_cosmos_driver::diagnostics::TransportKind; use azure_data_cosmos_driver::fault_injection::*; use std::error::Error; use std::sync::Arc; @@ -350,18 +351,21 @@ pub async fn fault_injection_connection_error() -> Result<(), Box> { /// Gateway 2.0 503 Service Unavailable should trigger regional failover. /// -/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind -/// filter, scope this rule to `TransportKind::Gateway20` so it doesn't also -/// fire on standard-gateway requests issued during account discovery. +/// The rule is scoped to [`TransportKind::Gateway20`] so it does not also +/// fire on standard-gateway requests issued during account discovery. The +/// emulator does not yet expose Gateway 2.0 endpoints, so this test is +/// gated behind the `gateway20` test category until CI gains a thin-client +/// account; see `docs/GATEWAY_20_SPEC.md` (Phase 6). #[tokio::test] #[cfg_attr( - not(test_category = "emulator"), - ignore = "requires test_category 'emulator'" + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20'" )] pub async fn gateway20_service_unavailable_triggers_regional_failover() -> Result<(), Box> { let condition = FaultInjectionConditionBuilder::new() .with_operation_type(FaultOperationType::ReadItem) + .with_transport_kind(TransportKind::Gateway20) .build(); let result = FaultInjectionResultBuilder::new() @@ -406,20 +410,18 @@ pub async fn gateway20_service_unavailable_triggers_regional_failover() -> Resul /// but stay local-only for writes (single-region writes can't safely retry /// across regions without risking duplicates). /// -/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind -/// filter, scope this rule to `TransportKind::Gateway20`. Today the emulator -/// only exposes the standard gateway, so this test executes against the -/// standard transport in the emulator — it acts as a contract lock for the -/// behavior that must also hold on Gateway 2.0 once a thin-client account is -/// available in CI. +/// The rule is scoped to [`TransportKind::Gateway20`] so it does not affect +/// standard-gateway traffic. The emulator does not yet expose Gateway 2.0 +/// endpoints, so this test is gated behind the `gateway20` test category. #[tokio::test] #[cfg_attr( - not(test_category = "emulator"), - ignore = "requires test_category 'emulator'" + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20'" )] pub async fn gateway20_request_timeout_cross_region_for_reads() -> Result<(), Box> { let condition = FaultInjectionConditionBuilder::new() .with_operation_type(FaultOperationType::ReadItem) + .with_transport_kind(TransportKind::Gateway20) .build(); let result = FaultInjectionResultBuilder::new() @@ -465,19 +467,19 @@ pub async fn gateway20_request_timeout_cross_region_for_reads() -> Result<(), Bo /// mismatch, which is unrelated to the routing topology — refreshing PKRange /// would be a wasted metadata round-trip. /// -/// TODO(Phase 6): once `FaultInjectionCondition` supports a per-transport-kind -/// filter, scope to `TransportKind::Gateway20`. Today, asserting the absence -/// of a PKRange refresh requires diagnostics that record metadata-cache hits -/// — this test is the contract lock and will tighten its assertion once that -/// observability is in place. +/// The rule is scoped to [`TransportKind::Gateway20`] so it does not also +/// fire on standard-gateway requests. The emulator does not yet expose +/// Gateway 2.0 endpoints, so this test is gated behind the `gateway20` +/// test category until CI gains a thin-client account. #[tokio::test] #[cfg_attr( - not(test_category = "emulator"), - ignore = "requires test_category 'emulator'" + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20'" )] pub async fn gateway20_read_session_not_available_remote_preferred() -> Result<(), Box> { let condition = FaultInjectionConditionBuilder::new() .with_operation_type(FaultOperationType::ReadItem) + .with_transport_kind(TransportKind::Gateway20) .build(); let result = FaultInjectionResultBuilder::new() From 07e72e10b4b208127351ac2769f5ffa283143da9 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 09:49:12 -0700 Subject: [PATCH 35/48] Expose Gateway 2.0 toggle on CosmosClientBuilder MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a public `with_gateway20_disabled(bool)` method on `CosmosClientBuilder` that propagates the flag to the underlying driver's `ConnectionPoolOptionsBuilder`. This is the SDK-level entry point that operators use to opt in to (or out of) the Gateway 2.0 transport. Pre-GA, the toggle defaults to `true` (Gateway 2.0 suppressed) so the behavioural change must be explicitly requested. The negative-term name mirrors the driver-side flag and follows the negative-term policy from GATEWAY_20_SPEC §3. With the public toggle in place, fills the placeholder bodies in `gateway20_e2e.rs` for point CRUD, query, transactional batch, diagnostics validation, and the operator-override case. Each test provisions a fresh database+container against the live Gateway 2.0 account, drives the operation, and asserts the standard `CosmosDiagnostics` fields are populated. The change-feed test stays empty until the SDK exposes a public change-feed API; `TransportKind` assertions are documented as future work pending CosmosDiagnostics exposure of the driver transport kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/clients/cosmos_client_builder.rs | 39 ++ .../tests/emulator_tests/gateway20_e2e.rs | 356 ++++++++++++++---- 2 files changed, 331 insertions(+), 64 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs index ff7d6953bb3..bc727d3e8ff 100644 --- a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs @@ -93,6 +93,12 @@ pub struct CosmosClientBuilder { fault_injection_builder: Option, /// Fallback endpoints tried when the primary endpoint is unavailable. backup_endpoints: Vec, + /// When `true` (the default pre-GA), the Gateway 2.0 ("thin client") + /// transport is suppressed and the SDK uses only the standard gateway. + /// Set to `false` via [`with_gateway20_disabled`](Self::with_gateway20_disabled) + /// to opt in to Gateway 2.0 routing once the account advertises a + /// thin-client endpoint. + gateway20_disabled: Option, } impl CosmosClientBuilder { @@ -168,6 +174,36 @@ impl CosmosClientBuilder { self } + /// Controls whether the Gateway 2.0 ("thin client") transport is + /// suppressed for this client. + /// + /// The Gateway 2.0 transport is the next-generation Cosmos DB dataplane + /// transport — it terminates SDK connections at a regional thin-client + /// proxy that forwards RNTBD-over-HTTP/2 to the backend. Today it is + /// **disabled by default** while the implementation is still being + /// rolled out; this default will flip before GA. + /// + /// * Pass `true` to keep the standard gateway path (current default). + /// * Pass `false` to opt in to Gateway 2.0 when the account advertises + /// a thin-client endpoint. The standard gateway remains as the + /// automatic fallback transport for any request that cannot use + /// Gateway 2.0 (e.g., metadata requests, accounts without a + /// thin-client endpoint). + /// + /// The negative-term name (`gateway20_disabled`) is intentional and + /// follows the SDK's negative-term policy for behaviour-disabling + /// flags (see `GATEWAY_20_SPEC.md` §3): the operator's intent on the + /// wire reads as "disable this thing" rather than "do not allow this + /// thing", which composes cleanly with future features. + /// + /// # Arguments + /// + /// * `disabled` - `true` to suppress Gateway 2.0; `false` to opt in. + pub fn with_gateway20_disabled(mut self, disabled: bool) -> Self { + self.gateway20_disabled = Some(disabled); + self + } + /// Registers a throughput control group on the driver runtime. /// /// Groups define throughput policies (priority level, throughput bucket) that @@ -425,6 +461,9 @@ impl CosmosClientBuilder { EmulatorServerCertValidation::DangerousDisabled, ); } + if let Some(disabled) = self.gateway20_disabled { + pool_builder = pool_builder.with_gateway20_disabled(disabled); + } driver_runtime_builder = driver_runtime_builder.with_connection_pool(pool_builder.build()?); #[cfg(feature = "fault_injection")] diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index 13cb150815b..bce400b1e43 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -12,24 +12,49 @@ //! `testCategory` to `gateway20` (or `gateway20_multi_region`) so the tests //! run in CI against the live account. //! -//! ## Current state +//! ## What these tests assert today //! -//! Every test in this file is a placeholder stub: +//! [`CosmosClientBuilder::with_gateway20_disabled`] now propagates the +//! Gateway 2.0 toggle into the underlying driver, so the tests exercise the +//! real SDK opt-in path against the live account. //! -//! * [`CosmosClientOptions`](azure_data_cosmos::CosmosClientOptions) does not -//! yet expose a public Gateway 2.0 enable/disable setter. Without it, the -//! SDK cannot deterministically opt a client into Gateway 2.0 from outside -//! the crate. -//! * The driver-level toggle -//! (`ConnectionPoolOptions::with_gateway20_disabled`) is not wired through -//! to `CosmosClientOptions`. +//! Each implemented test: //! -//! Once the SDK exposes a public `with_gateway20_disabled` (or equivalent) -//! setter, fill in each test body. Until then, the bodies are intentionally -//! empty so the file compiles and the test names lock in the contract. +//! * Builds a [`CosmosClient`] with `with_gateway20_disabled(false)` (or +//! `true`, for the operator-override scenario), pointing at the +//! `AZURE_COSMOS_GW20_ENDPOINT/_KEY` account. +//! * Provisions a fresh database + container and drives the operation +//! appropriate to the test (CRUD, query, batch, point read). +//! * Asserts the operation succeeds and the standard +//! [`CosmosDiagnostics`] fields (activity ID + server duration) are +//! populated. +//! +//! ## Future work (`TODO`) +//! +//! The SDK-level [`CosmosDiagnostics`] type does not yet surface the driver's +//! `TransportKind` — that gap is documented on `CosmosDiagnostics` itself +//! ("will be expanded ... once the SDK pipeline is ported to the driver's +//! transport pipeline"). Once that exposure lands, each test should be +//! tightened to assert `TransportKind::Gateway20` (or `StandardGateway` for +//! the override case) on the diagnostics instance returned from the +//! operation. +//! +//! The change-feed test stays a placeholder until the SDK gains a public +//! change-feed API on `ContainerClient` (only the routing-layer change-feed +//! plumbing exists today; there is no `ContainerClient::change_feed` to +//! call from a public test). #![cfg(feature = "key_auth")] +use azure_core::credentials::Secret; +use azure_data_cosmos::models::{ContainerProperties, PartitionKeyDefinition}; +use azure_data_cosmos::{ + CosmosAccountEndpoint, CosmosAccountReference, CosmosClient, Query, Region, RoutingStrategy, + TransactionalBatch, +}; +use futures::StreamExt; +use serde::{Deserialize, Serialize}; + fn read_env(name: &str) -> Option { std::env::var(name).ok().filter(|v| !v.trim().is_empty()) } @@ -42,68 +67,223 @@ fn live_credentials() -> Option<(String, String)> { )) } +/// Build a [`CosmosClient`] against the live Gateway 2.0 account. +/// +/// `gateway20_disabled = false` opts the client in to Gateway 2.0; passing +/// `true` exercises the operator-override path that pins the client to the +/// standard gateway even when the account advertises a thin-client endpoint. +async fn build_client( + endpoint: &str, + key: &str, + gateway20_disabled: bool, +) -> Result> { + let endpoint: CosmosAccountEndpoint = endpoint.parse()?; + let account_ref = + CosmosAccountReference::with_master_key(endpoint, Secret::from(key.to_string())); + let client = CosmosClient::builder() + .with_gateway20_disabled(gateway20_disabled) + .build(account_ref, RoutingStrategy::ProximityTo(Region::EAST_US)) + .await?; + Ok(client) +} + +/// Provisions a fresh database + container scoped to the test invocation and +/// returns the database name (so the caller can drop it) and a container +/// client to drive operations against. +async fn provision_database_and_container( + client: &CosmosClient, +) -> Result<(String, azure_data_cosmos::clients::ContainerClient), Box> { + let unique = azure_core::Uuid::new_v4(); + let db_name = format!("gw20-test-db-{unique}"); + let container_name = format!("gw20-test-container-{unique}"); + + client.create_database(&db_name, None).await?; + let db_client = client.database_client(&db_name); + + let pk_def: PartitionKeyDefinition = "/pk".into(); + let properties = ContainerProperties::new(container_name.clone(), pk_def); + db_client.create_container(properties, None).await?; + let container_client = db_client.container_client(&container_name).await?; + + Ok((db_name, container_client)) +} + +async fn drop_database(client: &CosmosClient, db_name: &str) { + let db_client = client.database_client(db_name); + let _ = db_client.delete(None).await; +} + +#[derive(Debug, Deserialize, Serialize, PartialEq, Eq, Clone)] +struct Gw20TestItem { + id: String, + pk: String, + value: i64, + label: String, +} + /// Drives a point CRUD round-trip (create → read → replace → delete) against /// the live Gateway 2.0 account. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 -/// toggle. Build a client with that toggle enabled, drive the CRUD operations, -/// and assert each request reports `TransportKind::Gateway20` in diagnostics. +/// TODO: tighten the per-response diagnostics check to assert +/// `TransportKind::Gateway20` once `CosmosDiagnostics` surfaces the +/// transport kind from the driver. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" )] -pub async fn gateway20_point_crud_round_trip() { - let Some((_endpoint, _key)) = live_credentials() else { - return; +pub async fn gateway20_point_crud_round_trip() -> Result<(), Box> { + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); }; - // TODO(Phase 6): build a CosmosClient with Gateway 2.0 enabled and run a - // create/read/replace/delete cycle on a single item, asserting each - // response surfaces `TransportKind::Gateway20` in diagnostics. + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + let item_id = format!("item-{}", azure_core::Uuid::new_v4()); + let mut item = Gw20TestItem { + id: item_id.clone(), + pk: pk_value.clone(), + value: 1, + label: "initial".into(), + }; + + let create_resp = container.create_item(&pk_value, &item, None).await?; + assert!(create_resp.diagnostics().activity_id().is_some()); + assert!(create_resp.diagnostics().server_duration_ms().is_some()); + + let read_resp = container + .read_item::(&pk_value, &item_id, None) + .await?; + assert!(read_resp.diagnostics().activity_id().is_some()); + let read_item: Gw20TestItem = read_resp.into_model()?; + assert_eq!(read_item, item); + + item.value = 2; + item.label = "updated".into(); + let replace_resp = container + .replace_item(&pk_value, &item_id, &item, None) + .await?; + assert!(replace_resp.diagnostics().activity_id().is_some()); + + let delete_resp = container.delete_item(&pk_value, &item_id, None).await?; + assert!(delete_resp.diagnostics().activity_id().is_some()); + + drop_database(&client, &db_name).await; + Ok(()) } /// Runs a SQL query through Gateway 2.0 and asserts the streamed pages all /// route through the thin-client transport. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 -/// toggle. Open a query feed, iterate every page, and assert the diagnostics -/// for each page record `TransportKind::Gateway20`. +/// TODO: tighten the per-page diagnostics check to assert +/// `TransportKind::Gateway20` once the SDK exposes the driver transport +/// kind on the page diagnostics. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" )] -pub async fn gateway20_query_streams_through_thin_client() { - let Some((_endpoint, _key)) = live_credentials() else { - return; +pub async fn gateway20_query_streams_through_thin_client() -> Result<(), Box> +{ + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); }; - // TODO(Phase 6): drive a multi-page SELECT query and assert every page's - // diagnostics report `TransportKind::Gateway20`. + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + for i in 0..5 { + let item = Gw20TestItem { + id: format!("query-item-{i}"), + pk: pk_value.clone(), + value: i64::from(i), + label: format!("row-{i}"), + }; + container.create_item(&pk_value, &item, None).await?; + } + + let query = Query::from("SELECT * FROM c ORDER BY c.value"); + let mut pages = container + .query_items::(query, pk_value.clone(), None)? + .into_pages(); + + let mut pages_seen = 0_usize; + let mut items_seen = 0_usize; + while let Some(page) = pages.next().await { + let page = page?; + pages_seen += 1; + assert!(page.diagnostics().activity_id().is_some()); + items_seen += page.items().len(); + } + assert!(pages_seen >= 1, "expected at least one query page"); + assert_eq!(items_seen, 5); + + drop_database(&client, &db_name).await; + Ok(()) } /// Runs a transactional batch through Gateway 2.0. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 -/// toggle. Submit a batch with mixed create/upsert/delete operations and -/// assert it routes through Gateway 2.0 end-to-end. +/// TODO: tighten the diagnostics check to assert `TransportKind::Gateway20` +/// once the SDK surfaces the driver transport kind on batch diagnostics. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" )] -pub async fn gateway20_transactional_batch() { - let Some((_endpoint, _key)) = live_credentials() else { - return; +pub async fn gateway20_transactional_batch() -> Result<(), Box> { + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); }; - // TODO(Phase 6): submit a mixed-op transactional batch and assert the - // diagnostics report `TransportKind::Gateway20`. + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + let item_a = Gw20TestItem { + id: "batch-a".into(), + pk: pk_value.clone(), + value: 10, + label: "a".into(), + }; + let item_b = Gw20TestItem { + id: "batch-b".into(), + pk: pk_value.clone(), + value: 20, + label: "b".into(), + }; + let upsert = Gw20TestItem { + id: "batch-c".into(), + pk: pk_value.clone(), + value: 30, + label: "c".into(), + }; + + let batch = TransactionalBatch::new(&pk_value) + .create_item(&item_a)? + .create_item(&item_b)? + .upsert_item(&upsert, None)?; + + let response = container.execute_transactional_batch(batch, None).await?; + let body = response.into_model()?; + let codes: Vec = body.results().iter().map(|r| r.status_code()).collect(); + assert_eq!(codes, vec![201, 201, 201]); + + drop_database(&client, &db_name).await; + Ok(()) } /// Drives a `LatestVersion` change feed iterator through Gateway 2.0. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 -/// toggle. Open a change feed iterator with `mode = LatestVersion` and assert -/// pages are served via Gateway 2.0. +/// TODO: implement once the SDK exposes a public change-feed API on +/// `ContainerClient`. Only routing-layer change-feed plumbing exists today +/// (`execute_partition_key_range_read_change_feed`); there is no public +/// `ContainerClient::change_feed` entry point yet, so the test cannot +/// exercise the SDK surface end-to-end. Tracking item: SDK change-feed +/// public API. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), @@ -113,47 +293,95 @@ pub async fn gateway20_change_feed_latest_version() { let Some((_endpoint, _key)) = live_credentials() else { return; }; - // TODO(Phase 6): consume a LatestVersion change feed and assert the - // diagnostics report `TransportKind::Gateway20`. + // Intentionally empty — see the test docs above for why. } -/// Verifies that `RequestDiagnostics` correctly reports -/// `TransportKind::Gateway20` for SDK-issued requests. +/// Verifies that diagnostics are populated for SDK-issued requests routed +/// through Gateway 2.0. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a Gateway 2.0 -/// toggle and the SDK surfaces `TransportKind` on its public diagnostics -/// type. +/// TODO: extend this test to assert `TransportKind::Gateway20` once +/// `CosmosDiagnostics` surfaces the driver transport kind. Today the SDK +/// `CosmosDiagnostics` only carries `activity_id` and `server_duration_ms`, +/// so the strongest behavioural assertion we can make is that those fields +/// are populated when the request was routed through the Gateway 2.0 +/// pipeline. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" )] -pub async fn gateway20_diagnostics_validation() { - let Some((_endpoint, _key)) = live_credentials() else { - return; +pub async fn gateway20_diagnostics_validation() -> Result<(), Box> { + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); + }; + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + let item = Gw20TestItem { + id: "diag-item".into(), + pk: pk_value.clone(), + value: 99, + label: "diag".into(), }; - // TODO(Phase 6): perform a single read against the live account and - // assert the resulting diagnostics record `TransportKind::Gateway20`. + container.create_item(&pk_value, &item, None).await?; + + let read_resp = container + .read_item::(&pk_value, "diag-item", None) + .await?; + let diagnostics = read_resp.diagnostics(); + assert!( + diagnostics.activity_id().is_some(), + "expected activity_id to be populated for a Gateway 2.0 request" + ); + assert!( + diagnostics.server_duration_ms().is_some(), + "expected server_duration_ms to be populated for a Gateway 2.0 request" + ); + + drop_database(&client, &db_name).await; + Ok(()) } /// Verifies the operator override at the SDK boundary: when the operator -/// disables Gateway 2.0 via the public client option, every request must -/// route through the standard gateway even though the account advertises a -/// thin-client endpoint. +/// disables Gateway 2.0 via [`CosmosClientBuilder::with_gateway20_disabled`], +/// every request must route through the standard gateway even though the +/// account advertises a thin-client endpoint. +/// +/// TODO: tighten the assertion to inspect `TransportKind::StandardGateway` +/// in the diagnostics once the SDK exposes the driver transport kind. /// -/// TODO(Phase 6): implement once `CosmosClientOptions` exposes a public -/// Gateway 2.0 toggle. Build a client with the toggle disabled, drive a -/// point read, and assert diagnostics report `TransportKind::StandardGateway`. +/// [`CosmosClientBuilder::with_gateway20_disabled`]: azure_data_cosmos::CosmosClientBuilder::with_gateway20_disabled #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" )] -pub async fn gateway20_operator_override_at_sdk_boundary() { - let Some((_endpoint, _key)) = live_credentials() else { - return; +pub async fn gateway20_operator_override_at_sdk_boundary() -> Result<(), Box> +{ + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); }; - // TODO(Phase 6): build a CosmosClient with Gateway 2.0 explicitly - // disabled, drive a point read, and assert the diagnostics report - // `TransportKind::StandardGateway`. + + let client = build_client(&endpoint, &key, true).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + let item = Gw20TestItem { + id: "override-item".into(), + pk: pk_value.clone(), + value: 7, + label: "override".into(), + }; + container.create_item(&pk_value, &item, None).await?; + + let read_resp = container + .read_item::(&pk_value, "override-item", None) + .await?; + let diagnostics = read_resp.diagnostics(); + assert!(diagnostics.activity_id().is_some()); + + drop_database(&client, &db_name).await; + Ok(()) } From 967cacad1811ded1bbfc5df0891c5a373c825b9d Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 10:07:38 -0700 Subject: [PATCH 36/48] Emit Gateway 2.0 EPK range headers for HPK partial-PK dispatches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wraps Gateway 2.0 dispatch with the Slice 3d cutover from a single EffectivePartitionKey RNTBD token to a payload that is *either* a point EPK token *or* an outer HTTP range header pair, never both. - Replace effective_partition_key_bytes with effective_partition_key_payload that calls EffectivePartitionKey::compute_range and branches on start == end (point) vs strict prefix (range). - Point ops (full PK or single-hash) keep emitting the EPK RNTBD token (0x005A) — current proxy contract preserved. - HPK partial-PK dispatches emit x-ms-thinclient-range-min/-max as outer HTTP headers (canonical un-padded hex), matching .NET's ProxyStartEpk/ProxyEndEpk and the spec's range header wire format. - compute_range errors propagate as DataConversion (mapped to BadRequest upstream) rather than emit broken EPK metadata. Adds three regression tests: - wrap_emits_range_headers_for_hpk_prefix_partition_key - wrap_emits_token_only_for_full_hpk_partition_key - wrap_rejects_partition_key_with_too_many_components Refs PR #4319 follow-up: Slice 3d EPK cutover for queries/read-feeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../driver/transport/gateway20_dispatch.rs | 209 ++++++++++++++++-- 1 file changed, 192 insertions(+), 17 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs index 797c03efb65..827cc0494b4 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs @@ -14,9 +14,12 @@ use azure_core::{ }; use uuid::Uuid; -use crate::models::{ - cosmos_headers::response_header_names, effective_partition_key::EffectivePartitionKey, - DefaultConsistencyLevel, OperationType, PartitionKey, PartitionKeyDefinition, ResourceType, +use crate::{ + constants::{GATEWAY20_RANGE_MAX, GATEWAY20_RANGE_MIN}, + models::{ + cosmos_headers::response_header_names, effective_partition_key::EffectivePartitionKey, + DefaultConsistencyLevel, OperationType, PartitionKey, PartitionKeyDefinition, ResourceType, + }, }; use super::{ @@ -61,9 +64,11 @@ pub(crate) fn wrap_request_for_gateway20( let resource_names = parse_resource_names(inputs.auth_context.resource_link.as_str())?; let has_payload = request.body.as_ref().is_some_and(|body| !body.is_empty()); + let epk_payload = effective_partition_key_payload(inputs)?; + let mut metadata = Vec::with_capacity(11); - if let Some(epk) = effective_partition_key_bytes(inputs)? { - metadata.push(Token::effective_partition_key(epk)); + if let Some(EpkPayload::Point(epk)) = epk_payload.as_ref() { + metadata.push(Token::effective_partition_key(epk.clone())); } metadata.push(Token::global_database_account_name(account_name.to_owned())); metadata.push(Token::database_name(resource_names.database)); @@ -102,6 +107,10 @@ pub(crate) fn wrap_request_for_gateway20( headers.insert(USER_AGENT, HeaderValue::from(user_agent.to_owned())); } headers.insert(X_MS_ACTIVITY_ID, HeaderValue::from(activity_id.to_string())); + if let Some(EpkPayload::Range { min, max }) = epk_payload.as_ref() { + headers.insert(GATEWAY20_RANGE_MIN, HeaderValue::from(min.clone())); + headers.insert(GATEWAY20_RANGE_MAX, HeaderValue::from(max.clone())); + } Ok(HttpRequest { url: request.url.clone(), @@ -193,17 +202,52 @@ fn next_transport_request_id() -> u32 { TRANSPORT_REQUEST_ID.fetch_add(1, Ordering::AcqRel) } -fn effective_partition_key_bytes(inputs: &WrapInputs<'_>) -> azure_core::Result>> { - match (inputs.partition_key, inputs.partition_key_definition) { - (Some(partition_key), Some(partition_key_definition)) => { - let epk = EffectivePartitionKey::compute( - partition_key.values(), - partition_key_definition.kind(), - partition_key_definition.version(), - ); - hex_to_bytes(epk.as_str()).map(Some) - } - _ => Ok(None), +/// Wire-form payload derived from the partition key + definition for a +/// Gateway 2.0 dispatch. +/// +/// `Point` represents a single-logical-partition operation and is emitted as +/// the `EffectivePartitionKey` RNTBD metadata token (binary EPK bytes). +/// `Range` represents an EPK range — either a hierarchical-PK prefix that +/// fans out across multiple physical partitions, or a feed/cross-partition +/// operation scoped to a sub-range — and is emitted as the +/// `x-ms-thinclient-range-min` / `-max` outer HTTP headers carrying the +/// canonical, un-padded hex EPK string per `GATEWAY_20_SPEC §"Range header +/// wire format"`. +/// +/// The two arms are mutually exclusive; the proxy must never see both an +/// EPK token and EPK range headers on the same request. +enum EpkPayload { + Point(Vec), + Range { min: String, max: String }, +} + +fn effective_partition_key_payload( + inputs: &WrapInputs<'_>, +) -> azure_core::Result> { + let (Some(partition_key), Some(partition_key_definition)) = + (inputs.partition_key, inputs.partition_key_definition) + else { + return Ok(None); + }; + + if partition_key.is_empty() { + return Ok(None); + } + + let range = + EffectivePartitionKey::compute_range(partition_key.values(), partition_key_definition) + .map_err(|err| { + data_conversion_error(format!("Gateway 2.0 EPK range computation failed: {err}")) + })?; + + if range.start == range.end { + let bytes = hex_to_bytes(range.start.as_str())?; + Ok(Some(EpkPayload::Point(bytes))) + } else { + Ok(Some(EpkPayload::Range { + min: range.start.as_str().to_owned(), + max: range.end.as_str().to_owned(), + })) } } @@ -289,7 +333,7 @@ mod tests { use azure_core::http::headers::{ACCEPT, CONTENT_TYPE}; use super::*; - use crate::models::{PartitionKeyKind, PartitionKeyVersion}; + use crate::models::{PartitionKeyKind, PartitionKeyValue, PartitionKeyVersion}; const ACTIVITY_ID: &str = "00112233-4455-6677-8899-aabbccddeeff"; @@ -548,6 +592,137 @@ mod tests { assert_eq!(parsed.tokens[&0x005A], ParsedTokenValue::Bytes(expected)); } + /// HPK partial-PK (prefix on a MultiHash container) is dispatched as an + /// EPK *range* via the outer `x-ms-thinclient-range-min`/`-max` HTTP + /// headers, not as an `EffectivePartitionKey` RNTBD token. The two + /// emission paths must be mutually exclusive. + #[test] + fn wrap_emits_range_headers_for_hpk_prefix_partition_key() { + let request = signed_request(None); + let auth_context = + AuthorizationContext::new(Method::Get, ResourceType::Document, "dbs/db1/colls/coll1"); + let partition_key = + PartitionKey::from(vec![PartitionKeyValue::from("tenant1".to_string())]); + let partition_key_definition = + PartitionKeyDefinition::from(("/tenantId", "/userId", "/sessionId")); + let expected_range = + EffectivePartitionKey::compute_range(partition_key.values(), &partition_key_definition) + .unwrap(); + assert_ne!( + expected_range.start, expected_range.end, + "HPK prefix must produce a non-point range — sanity check" + ); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs( + &auth_context, + OperationType::Query, + Some(&partition_key), + Some(&partition_key_definition), + ), + ) + .unwrap(); + + // Range headers on the outer HTTP request, carrying canonical un-padded hex. + assert_eq!( + wrapped.headers.get_optional_str(&GATEWAY20_RANGE_MIN), + Some(expected_range.start.as_str()) + ); + assert_eq!( + wrapped.headers.get_optional_str(&GATEWAY20_RANGE_MAX), + Some(expected_range.end.as_str()) + ); + + // No EPK token in the inner RNTBD frame for the range path. + // Token layout: 9 base tokens (account, db, coll, payload_present, + // auth, date, consistency, transport_request_id, capabilities) — no + // document_name (resource link omits /docs/...) and no EPK token. + let parsed = parse_wrapped_request(&wrapped, 9); + assert!( + !parsed.tokens.contains_key(&0x005A), + "EffectivePartitionKey token must not be emitted alongside range headers" + ); + } + + /// Full HPK key (component count == definition path count) collapses to a + /// point op: emit the EPK token, no range headers. + #[test] + fn wrap_emits_token_only_for_full_hpk_partition_key() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + let partition_key = PartitionKey::from(vec![ + PartitionKeyValue::from("tenant1".to_string()), + PartitionKeyValue::from("user1".to_string()), + PartitionKeyValue::from("session1".to_string()), + ]); + let partition_key_definition = + PartitionKeyDefinition::from(("/tenantId", "/userId", "/sessionId")); + + let wrapped = wrap_request_for_gateway20( + &request, + &wrap_inputs( + &auth_context, + OperationType::Read, + Some(&partition_key), + Some(&partition_key_definition), + ), + ) + .unwrap(); + + // Range headers must NOT be present on the point path. + assert!(wrapped + .headers + .get_optional_str(&GATEWAY20_RANGE_MIN) + .is_none()); + assert!(wrapped + .headers + .get_optional_str(&GATEWAY20_RANGE_MAX) + .is_none()); + + // EPK token present in the inner RNTBD frame. + let parsed = parse_wrapped_request(&wrapped, 11); + assert!( + parsed.tokens.contains_key(&0x005A), + "EffectivePartitionKey token must be emitted for full HPK partition key" + ); + } + + /// `compute_range` error cases (e.g., more PK components supplied than the + /// container's definition declares) must surface as a wrap error, mapped + /// to `BadRequest` upstream — never silently emit broken EPK metadata. + #[test] + fn wrap_rejects_partition_key_with_too_many_components() { + let request = signed_request(None); + let auth_context = AuthorizationContext::new( + Method::Get, + ResourceType::Document, + "dbs/db1/colls/coll1/docs/doc1", + ); + let partition_key = PartitionKey::from(vec![ + PartitionKeyValue::from("tenant1".to_string()), + PartitionKeyValue::from("extra".to_string()), + ]); + let partition_key_definition = PartitionKeyDefinition::from("/tenantId"); + + let error = wrap_request_for_gateway20( + &request, + &wrap_inputs( + &auth_context, + OperationType::Read, + Some(&partition_key), + Some(&partition_key_definition), + ), + ) + .unwrap_err(); + + assert_eq!(error.kind(), &ErrorKind::DataConversion); + } + #[test] fn wrap_only_keeps_user_agent_and_activity_id_headers() { let request = signed_request(None); From 97724866ea0fef9eee3f81107130a9f67a57f42b Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 10:12:21 -0700 Subject: [PATCH 37/48] Add HPK partial-PK round-trip E2E test for Gateway 2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the spec test-coverage row 'HPK + Gateway 2.0: full vs partial PK' on the SDK side: the unit-level header-emission proof was added with the Slice 3d cutover; this commit guards the public SDK surface against regressions where partial-PK queries would silently degrade. The new test: - Provisions a 3-component HPK container (/tenantId, /userId, /sessionId). - Inserts items spread across two tenants × two users × two sessions. - Reads one item back via its full 3-component PK, exercising the EPK-token point-op path. - Queries with a 1-component prefix (tenantId only) — the dispatcher emits x-ms-thinclient-range-min/-max — and asserts: * at least one page is returned; * every returned item belongs to the targeted tenant (no cross-tenant bleed); * the set of returned IDs matches the expected per-tenant set. PartitionKey only has tuple From-impls for 2 and 3 components; the 1-component prefix is constructed from a Vec so the dispatcher sees an HPK partial PK rather than a single-hash key. Refs PR #4319 follow-up: HPK partial-PK paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../tests/emulator_tests/gateway20_e2e.rs | 147 ++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index bce400b1e43..c0091337444 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -385,3 +385,150 @@ pub async fn gateway20_operator_override_at_sdk_boundary() -> Result<(), Box Result<(String, azure_data_cosmos::clients::ContainerClient), Box> { + let unique = azure_core::Uuid::new_v4(); + let db_name = format!("gw20-test-db-{unique}"); + let container_name = format!("gw20-test-hpk-container-{unique}"); + + client.create_database(&db_name, None).await?; + let db_client = client.database_client(&db_name); + + let pk_def = PartitionKeyDefinition::from(("/tenantId", "/userId", "/sessionId")); + let properties = ContainerProperties::new(container_name.clone(), pk_def); + db_client.create_container(properties, None).await?; + let container_client = db_client.container_client(&container_name).await?; + + Ok((db_name, container_client)) +} + +#[derive(Debug, Deserialize, Serialize, PartialEq, Eq, Clone)] +struct Gw20HpkItem { + id: String, + #[serde(rename = "tenantId")] + tenant_id: String, + #[serde(rename = "userId")] + user_id: String, + #[serde(rename = "sessionId")] + session_id: String, + value: i64, +} + +/// Round-trip exercises Gateway 2.0 against a 3-component hierarchical +/// partition key container, asserting both the **full PK** point-op path +/// and the **partial PK** range-dispatch path (`x-ms-thinclient-range-min` +/// / `-max`) discussed in the Gateway 2.0 spec test matrix +/// ("HPK + Gateway 2.0: full vs partial PK"). +/// +/// 1. Inserts items spread across two tenants × two users. +/// 2. Reads each item back via its full 3-component PK (point op → EPK token). +/// 3. Queries with a **1-component prefix** (`tenantId` only) and asserts +/// the items for that tenant come back across however many pages the +/// proxy fans out into. +/// +/// The point-vs-range header emission is asserted at unit level in +/// `gateway20_dispatch::tests`; this E2E test guards the SDK-public surface +/// against regressions where partial-PK queries silently degrade to +/// single-partition or fail. +/// +/// TODO: tighten the diagnostics check to assert `TransportKind::Gateway20` +/// once the SDK surfaces the driver transport kind. +#[tokio::test] +#[cfg_attr( + not(test_category = "gateway20"), + ignore = "requires test_category 'gateway20' and AZURE_COSMOS_GW20_ENDPOINT/_KEY" +)] +pub async fn gateway20_hpk_full_and_partial_partition_key_round_trip( +) -> Result<(), Box> { + use azure_data_cosmos::{PartitionKey, PartitionKeyValue}; + + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); + }; + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_hpk_container(&client).await?; + + let target_tenant = format!("tenant-{}", azure_core::Uuid::new_v4()); + let other_tenant = format!("tenant-{}", azure_core::Uuid::new_v4()); + + // Two users × two sessions per tenant => 4 items per tenant. + let mut expected_target_ids = Vec::new(); + for tenant in [target_tenant.as_str(), other_tenant.as_str()] { + for user_idx in 0..2 { + for session_idx in 0..2 { + let user_id = format!("user-{user_idx}"); + let session_id = format!("session-{session_idx}"); + let id = format!("{tenant}-{user_id}-{session_id}"); + if tenant == target_tenant { + expected_target_ids.push(id.clone()); + } + let item = Gw20HpkItem { + id: id.clone(), + tenant_id: tenant.to_string(), + user_id: user_id.clone(), + session_id: session_id.clone(), + value: i64::from(user_idx * 10 + session_idx), + }; + // PartitionKey tuple impls require owned types (the underlying + // `PartitionKeyValue: From<&'static str>` impl is the only + // borrow-friendly one) — clone strings into the tuple. + let pk = PartitionKey::from((tenant.to_string(), user_id, session_id)); + container.create_item(pk, &item, None).await?; + } + } + } + + // Full HPK point read (3-of-3 components → EPK token path). + let full_pk = PartitionKey::from(( + target_tenant.clone(), + "user-0".to_string(), + "session-0".to_string(), + )); + let full_id = format!("{target_tenant}-user-0-session-0"); + let read_resp = container + .read_item::(full_pk, &full_id, None) + .await?; + let item: Gw20HpkItem = read_resp.into_model()?; + assert_eq!(item.id, full_id); + assert_eq!(item.tenant_id, target_tenant); + + // Partial HPK query (1-of-3 components → range header path). + // PartitionKey only has tuple From-impls for 2 and 3 components; for a + // single-component prefix, construct it from a Vec so + // the dispatcher sees a 1-component value against a 3-path container. + let partial_pk = PartitionKey::from(vec![PartitionKeyValue::from(target_tenant.clone())]); + let query = Query::from("SELECT * FROM c"); + let mut pages = container + .query_items::(query, partial_pk, None)? + .into_pages(); + + let mut returned_ids: Vec = Vec::new(); + let mut pages_seen = 0_usize; + while let Some(page) = pages.next().await { + let page = page?; + pages_seen += 1; + assert!(page.diagnostics().activity_id().is_some()); + for it in page.items() { + assert_eq!( + it.tenant_id, target_tenant, + "partial-PK query must not bleed across tenants" + ); + returned_ids.push(it.id.clone()); + } + } + assert!(pages_seen >= 1, "expected at least one query page"); + expected_target_ids.sort(); + returned_ids.sort(); + assert_eq!(returned_ids, expected_target_ids); + + drop_database(&client, &db_name).await; + Ok(()) +} From 0aa52f11804e6ee5bc37dfb7384043c32691455f Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 11:06:51 -0700 Subject: [PATCH 38/48] Propagate x-ms-continuation into Gateway 2.0 RNTBD frame The wrap path was dropping the inbound x-ms-continuation header before serializing the RNTBD frame, so paginated queries on Gateway 2.0 always restarted from page one and never advanced. - Add RntbdRequestToken::ContinuationToken (0x0006, String) to mirror Java's RntbdRequestHeader.ContinuationToken and the same not-in-thinClientProxyExcludedSet behavior in .NET. - Plumb the inbound x-ms-continuation header into the RNTBD metadata stream as a string token; values are passed through verbatim (including empty strings) for symmetry with the unwrap side and the .NET/Java implementations. - Document the request/response continuation-token format in GATEWAY_20_SPEC.md. - Add 3 driver-layer unit tests covering present/absent/empty header scenarios, plus an emulator-only E2E test (gateway20_query_paginates_via_continuation_tokens) that forces multi-page pagination and asserts no row is returned twice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../tests/emulator_tests/gateway20_e2e.rs | 90 +++++++++++++++- .../docs/GATEWAY_20_SPEC.md | 9 ++ .../driver/transport/gateway20_dispatch.rs | 101 ++++++++++++++++++ .../src/driver/transport/rntbd/tokens.rs | 14 +++ 4 files changed, 213 insertions(+), 1 deletion(-) diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index c0091337444..77f041a8293 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -225,7 +225,95 @@ pub async fn gateway20_query_streams_through_thin_client() -> Result<(), Box Result<(), Box> { + use azure_core::http::headers::{HeaderName, HeaderValue}; + use azure_data_cosmos::options::{OperationOptions, QueryOptions}; + use std::collections::{HashMap, HashSet}; + + let Some((endpoint, key)) = live_credentials() else { + return Ok(()); + }; + + let client = build_client(&endpoint, &key, false).await?; + let (db_name, container) = provision_database_and_container(&client).await?; + + let pk_value = format!("pk-{}", azure_core::Uuid::new_v4()); + let total_items: usize = 7; + for i in 0..total_items { + let item = Gw20TestItem { + id: format!("page-item-{i}"), + pk: pk_value.clone(), + value: i as i64, + label: format!("row-{i}"), + }; + container.create_item(&pk_value, &item, None).await?; + } + + let mut custom_headers: HashMap = HashMap::new(); + custom_headers.insert( + HeaderName::from_static("x-ms-max-item-count"), + HeaderValue::from_static("2"), + ); + let query_options = QueryOptions::default() + .with_operation_options(OperationOptions::default().with_custom_headers(custom_headers)); + + let query = Query::from("SELECT * FROM c ORDER BY c.value"); + let mut pages = container + .query_items::(query, pk_value.clone(), Some(query_options))? + .into_pages(); + + let mut pages_seen = 0_usize; + let mut ids_seen: HashSet = HashSet::new(); + while let Some(page) = pages.next().await { + let page = page?; + pages_seen += 1; + assert!( + page.diagnostics().activity_id().is_some(), + "every Gateway 2.0 page must surface an activity-id", + ); + for item in page.items() { + assert!( + ids_seen.insert(item.id.clone()), + "item {} returned twice — pagination did not advance (continuation token not propagated)", + item.id, + ); + } + } + + assert!( + pages_seen > 1, + "expected continuation-driven pagination to produce more than one page (got {pages_seen})", + ); + assert_eq!( + ids_seen.len(), + total_items, + "expected all {total_items} inserted rows; saw {} unique ids across {pages_seen} pages", + ids_seen.len(), + ); + + drop_database(&client, &db_name).await; + Ok(()) +} + /// /// TODO: tighten the diagnostics check to assert `TransportKind::Gateway20` /// once the SDK surfaces the driver transport kind on batch diagnostics. diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index e18a4170ee8..93e2cb419e8 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -205,6 +205,15 @@ The Rust deserializer **must** treat the RNTBD response metadata-token stream as - **Unknown token type IDs MUST be silently skipped** (consume `length` bytes and continue) — the deserializer must NOT panic, return an error, or fail the response, and must NOT log per-token (silent skip is the contract). The proxy is free to add new metadata tokens at any time and the driver must remain forward-compatible across proxy upgrades that ship before the corresponding Rust release. This silent-tolerance behavior is the *implementation* of the `IgnoreUnknownRntbdTokens` capability bit advertised over the `x-ms-cosmos-sdk-supportedcapabilities` header (see "SDK-supported-capabilities advertisement" below) — the proxy/backend assumes the SDK will not surface or warn on unknown tokens, so per-token logging is unnecessary noise. - **Inverse contract on the request side**: the request serializer drops headers that appear in `thinClientProxyExcludedSet` (see §"RNTBD Request Wire Format" Notes column). That set enumerates headers the proxy does not understand on the inbound RNTBD frame; emitting them would be either ignored or rejected. +##### Continuation-token format (request and response) + +Continuation tokens are **opaque server-issued strings** in both directions; the SDK never parses, validates, or rewrites them. The wire format is a length-prefixed UTF-8 string token mirroring Java's RNTBD encoding: + +- **Request side** — `RntbdRequestToken::ContinuationToken` (ID `0x0006`, `TokenType::String`). When the inbound HTTP request carries `x-ms-continuation`, the wrap path serializes the value verbatim into the RNTBD metadata stream and **strips** the header from the outer HTTP request (the outer body is the RNTBD frame; metadata never duplicates onto outer headers). Empty values are passed through as zero-length string tokens — the wrap path does not infer intent from emptiness, matching the unwrap side and the .NET/Java behavior. +- **Response side** — `RntbdResponseToken::ContinuationToken` (ID `0x0003`, `TokenType::String`). The unwrap path forwards the token value verbatim into the synthetic HTTP response's `x-ms-continuation` header. + +Identical semantics to .NET (`ThinClientStoreClient.cs` / `ThinClientTransportSerializer.cs`, which contain no continuation-specific logic and rely on the standard gateway path) and Java (`RntbdRequestHeader.ContinuationToken` is *not* in `thinClientProxyExcludedSet`, so it traverses the same encode/decode path as standard direct-mode RNTBD). There is no Gateway-2.0-specific token format, base64 wrapper, or version prefix; pagination cursors round-trip byte-for-byte. + Phase 6's "RNTBD unknown-token tolerance" unit test pins this behavior: a hand-crafted response frame containing a synthetic unrecognized token ID must round-trip without error and surface every recognized token correctly. #### SDK-supported-capabilities advertisement diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs index 827cc0494b4..28487d07add 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/gateway20_dispatch.rs @@ -33,6 +33,7 @@ const X_MS_ACTIVITY_ID: HeaderName = HeaderName::from_static("x-ms-activity-id") const X_MS_DATE: HeaderName = HeaderName::from_static("x-ms-date"); const X_MS_LSN: HeaderName = HeaderName::from_static("x-ms-lsn"); const X_MS_GLOBAL_COMMITTED_LSN: HeaderName = HeaderName::from_static("x-ms-global-committed-lsn"); +const X_MS_CONTINUATION: HeaderName = HeaderName::from_static("x-ms-continuation"); static TRANSPORT_REQUEST_ID: AtomicU32 = AtomicU32::new(0); /// Inputs resolved by the operation pipeline before a Gateway 2.0 dispatch. @@ -88,6 +89,9 @@ pub(crate) fn wrap_request_for_gateway20( metadata.push(Token::sdk_supported_capabilities( SUPPORTED_CAPABILITIES_BITS, )); + if let Some(continuation) = request.headers.get_optional_str(&X_MS_CONTINUATION) { + metadata.push(Token::continuation_token(continuation.to_owned())); + } let frame = RntbdRequestFrame { resource_type: inputs.resource_type, @@ -723,6 +727,103 @@ mod tests { assert_eq!(error.kind(), &ErrorKind::DataConversion); } + #[test] + fn wrap_propagates_continuation_token_into_rntbd_metadata() { + let mut request = signed_request(None); + request.headers.insert(X_MS_CONTINUATION, "page-token-1"); + let auth_context = + AuthorizationContext::new(Method::Get, ResourceType::Document, "dbs/db1/colls/coll1"); + + let wrapped = wrap_request_for_gateway20( + &request, + &WrapInputs { + auth_context: &auth_context, + operation_type: OperationType::Query, + resource_type: ResourceType::Document, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, + account_name: Some("account"), + }, + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 10); + + assert_eq!( + parsed.tokens[&0x0006], + ParsedTokenValue::String("page-token-1".into()), + "continuation token should be encoded as string token 0x0006", + ); + assert!( + wrapped + .headers + .get_optional_str(&X_MS_CONTINUATION) + .is_none(), + "x-ms-continuation header should not be forwarded on the outer HTTP request", + ); + } + + #[test] + fn wrap_omits_continuation_token_when_header_absent() { + let request = signed_request(None); + let auth_context = + AuthorizationContext::new(Method::Get, ResourceType::Document, "dbs/db1/colls/coll1"); + + let wrapped = wrap_request_for_gateway20( + &request, + &WrapInputs { + auth_context: &auth_context, + operation_type: OperationType::Query, + resource_type: ResourceType::Document, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, + account_name: Some("account"), + }, + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 9); + + assert!( + !parsed.tokens.contains_key(&0x0006), + "continuation token should be absent when no x-ms-continuation header is present", + ); + } + + #[test] + fn wrap_emits_empty_continuation_token_when_header_value_empty() { + // Symmetry with .NET (`ThinClientStoreClient.PrepareRequestForProxyAsync`), + // Java (`RntbdRequestHeader.ContinuationToken` is *not* in + // `thinClientProxyExcludedSet`), and the unwrap side which forwards + // empty continuation strings verbatim. Continuation is opaque on the + // wire — the wrap path does not infer intent from emptiness. + let mut request = signed_request(None); + request.headers.insert(X_MS_CONTINUATION, ""); + let auth_context = + AuthorizationContext::new(Method::Get, ResourceType::Document, "dbs/db1/colls/coll1"); + + let wrapped = wrap_request_for_gateway20( + &request, + &WrapInputs { + auth_context: &auth_context, + operation_type: OperationType::Query, + resource_type: ResourceType::Document, + partition_key: None, + partition_key_definition: None, + effective_consistency: DefaultConsistencyLevel::Session, + account_name: Some("account"), + }, + ) + .unwrap(); + let parsed = parse_wrapped_request(&wrapped, 10); + + assert_eq!( + parsed.tokens[&0x0006], + ParsedTokenValue::String(String::new()), + "empty continuation header should be emitted as a zero-length string token", + ); + } + #[test] fn wrap_only_keeps_user_agent_and_activity_id_headers() { let request = signed_request(None); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs index 7311ee3f83c..125a717b042 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/rntbd/tokens.rs @@ -341,6 +341,17 @@ impl Token { ) } + /// Pagination cursor echoed back to the proxy on subsequent feed/query + /// requests. Wire format matches Java's `RntbdRequestHeader.ContinuationToken` + /// (ID 0x0006, string) — the SDK passes the value through unchanged so + /// the backend can resume from the previous offset. + pub(crate) fn continuation_token(value: String) -> Self { + Self::new( + RntbdRequestToken::ContinuationToken.into(), + TokenValue::String(value), + ) + } + /// Returns the number of bytes this token occupies on the wire. pub(super) fn encoded_len(&self) -> usize { 2 + 1 + self.value.encoded_len() @@ -371,6 +382,7 @@ pub(crate) enum RntbdRequestToken { AuthorizationToken, PayloadPresent, Date, + ContinuationToken, ConsistencyLevel, DatabaseName, CollectionName, @@ -389,6 +401,7 @@ impl TryFrom for RntbdRequestToken { 0x0001 => Ok(Self::AuthorizationToken), 0x0002 => Ok(Self::PayloadPresent), 0x0003 => Ok(Self::Date), + 0x0006 => Ok(Self::ContinuationToken), 0x0010 => Ok(Self::ConsistencyLevel), 0x0015 => Ok(Self::DatabaseName), 0x0016 => Ok(Self::CollectionName), @@ -408,6 +421,7 @@ impl From for u16 { RntbdRequestToken::AuthorizationToken => 0x0001, RntbdRequestToken::PayloadPresent => 0x0002, RntbdRequestToken::Date => 0x0003, + RntbdRequestToken::ContinuationToken => 0x0006, RntbdRequestToken::ConsistencyLevel => 0x0010, RntbdRequestToken::DatabaseName => 0x0015, RntbdRequestToken::CollectionName => 0x0016, From 88a55f8736f11906b2789de818f85835fe804581 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:05:34 -0700 Subject: [PATCH 39/48] Flip Gateway 2.0 default to enabled and rewrite docs Gateway 2.0 (the next-generation Cosmos DB dataplane transport) is now on by default whenever the account advertises a thin-client endpoint and HTTP/2 is allowed on the connection-pool options. Operators can still opt out per-client via `ConnectionPoolOptionsBuilder::with_gateway20_disabled(true)` (driver) or `CosmosClientBuilder::with_gateway20_disabled(true)` (SDK). The doc comments on both opt-out methods (and on the underlying field + accessor) are rewritten to: * Drop the negative-term explanation that framed Gateway 2.0 as a pre-GA opt-in. Gateway 2.0 is on by default and the docs now say so directly. * Add a 'Latency caveat' section noting that Gateway 2.0 is not covered by the Cosmos DB regional latency SLA. Workloads with strict latency requirements should evaluate before relying on it. Test impact: the only test asserting on the default is `connection_pool::tests::connection_pool_options_builder_defaults`, either set the flag explicitly or force HTTP/2 off (which short-circuits to disabled regardless of the default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/clients/cosmos_client_builder.rs | 56 +++++++++++-------- .../src/options/connection_pool.rs | 42 +++++++------- 2 files changed, 55 insertions(+), 43 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs index bc727d3e8ff..13047835266 100644 --- a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs @@ -93,11 +93,14 @@ pub struct CosmosClientBuilder { fault_injection_builder: Option, /// Fallback endpoints tried when the primary endpoint is unavailable. backup_endpoints: Vec, - /// When `true` (the default pre-GA), the Gateway 2.0 ("thin client") - /// transport is suppressed and the SDK uses only the standard gateway. - /// Set to `false` via [`with_gateway20_disabled`](Self::with_gateway20_disabled) - /// to opt in to Gateway 2.0 routing once the account advertises a - /// thin-client endpoint. + /// Operator override for the Gateway 2.0 ("thin client") transport. + /// + /// `None` (the default) leaves the underlying driver in charge of + /// routing — Gateway 2.0 is selected automatically whenever the + /// account advertises a thin-client endpoint and HTTP/2 is allowed. + /// `Some(true)` forces every request through the standard gateway + /// transport via [`with_gateway20_disabled`](Self::with_gateway20_disabled); + /// `Some(false)` explicitly opts in (matching the default behaviour). gateway20_disabled: Option, } @@ -174,31 +177,36 @@ impl CosmosClientBuilder { self } - /// Controls whether the Gateway 2.0 ("thin client") transport is - /// suppressed for this client. + /// Disables the Gateway 2.0 ("thin client") transport for this client. /// - /// The Gateway 2.0 transport is the next-generation Cosmos DB dataplane - /// transport — it terminates SDK connections at a regional thin-client - /// proxy that forwards RNTBD-over-HTTP/2 to the backend. Today it is - /// **disabled by default** while the implementation is still being - /// rolled out; this default will flip before GA. + /// Gateway 2.0 is the next-generation Cosmos DB dataplane transport: + /// SDK connections terminate at a regional thin-client proxy that + /// forwards RNTBD-over-HTTP/2 to the backend. **Gateway 2.0 is enabled + /// by default** — whenever the account advertises a thin-client endpoint + /// the SDK routes eligible dataplane operations through it and falls + /// back to the standard gateway only for operations Gateway 2.0 cannot + /// serve (e.g. metadata requests or accounts that do not advertise a + /// thin-client endpoint). /// - /// * Pass `true` to keep the standard gateway path (current default). - /// * Pass `false` to opt in to Gateway 2.0 when the account advertises - /// a thin-client endpoint. The standard gateway remains as the - /// automatic fallback transport for any request that cannot use - /// Gateway 2.0 (e.g., metadata requests, accounts without a - /// thin-client endpoint). + /// Pass `true` to opt out and force every request through the standard + /// gateway transport. The standard gateway path remains supported and + /// stable — disabling Gateway 2.0 is the recommended workaround if you + /// hit a regression on the new transport. /// - /// The negative-term name (`gateway20_disabled`) is intentional and - /// follows the SDK's negative-term policy for behaviour-disabling - /// flags (see `GATEWAY_20_SPEC.md` §3): the operator's intent on the - /// wire reads as "disable this thing" rather than "do not allow this - /// thing", which composes cleanly with future features. + /// # Latency caveat + /// + /// Gateway 2.0 traffic flows through a thin-client proxy that is + /// **not currently covered by the regional Cosmos DB latency SLA**. + /// Workloads with strict P99 latency requirements should opt out via + /// `with_gateway20_disabled(true)` until the proxy reaches general + /// availability. The extra hop also means Gateway 2.0 may add measurable + /// latency relative to the standard gateway in some regions. /// /// # Arguments /// - /// * `disabled` - `true` to suppress Gateway 2.0; `false` to opt in. + /// * `disabled` - `true` to suppress Gateway 2.0 and force the standard + /// gateway transport; `false` (or leaving the builder untouched) keeps + /// the default Gateway 2.0 behaviour. pub fn with_gateway20_disabled(mut self, disabled: bool) -> Self { self.gateway20_disabled = Some(disabled); self diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs index 0ca85de3632..15a1958fd4d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs @@ -208,12 +208,10 @@ impl ConnectionPoolOptions { /// Returns whether Gateway 2.0 is disabled for this pool. /// - /// Gateway 2.0 is enabled by default; this flag is the single supported - /// disablement mechanism (per `GATEWAY_20_SPEC.md` §3, all Gateway 2.0 - /// flags use a negative-term name so that defaults mean Gateway 2.0 is - /// enabled). When `true`, the driver routes every request through the - /// standard gateway transport even when the account advertises a - /// thin-client endpoint. + /// Gateway 2.0 is enabled by default whenever the account advertises a + /// thin-client endpoint and HTTP/2 is allowed. When this method returns + /// `true` the driver routes every request through the standard gateway + /// transport, regardless of the account advertisement. /// /// Gateway 2.0 also requires HTTP/2: when HTTP/2 is disabled, this method /// returns `true` regardless of how the builder was configured. @@ -507,9 +505,16 @@ impl ConnectionPoolOptionsBuilder { /// request through the standard gateway transport regardless of the /// account advertisement (operator override). /// - /// This is the single supported disablement mechanism per - /// `GATEWAY_20_SPEC.md` §3 — there is intentionally no - /// `AZURE_COSMOS_*` environment variable that toggles Gateway 2.0. + /// There is intentionally no `AZURE_COSMOS_*` environment variable that + /// toggles Gateway 2.0 — the override must be applied programmatically + /// via this method. + /// + /// # Latency caveat + /// + /// Gateway 2.0 traffic flows through a thin-client proxy that is **not + /// currently covered by the regional Cosmos DB latency SLA**. Workloads + /// with strict P99 latency requirements should call this method with + /// `true` until the proxy reaches general availability. pub fn with_gateway20_disabled(mut self, value: bool) -> Self { self.gateway20_disabled = Some(value); self @@ -551,15 +556,13 @@ impl ConnectionPoolOptionsBuilder { ValidationBounds::none(), )?; - // Gateway 2.0 is currently disabled by default while the implementation - // is still in pre-GA. Per `GATEWAY_20_SPEC.md` §3, the field uses a - // negative-term name (`gateway20_disabled`) so that the default state - // can be flipped to "enabled" by changing only the literal below; no - // call sites or environment variables need to change. There is - // intentionally no `AZURE_COSMOS_*` env var that toggles Gateway 2.0. - // - // TODO: Change to `false` (Gateway 2.0 enabled by default) before GA. - let explicit_disabled = self.gateway20_disabled.unwrap_or(true); + // Gateway 2.0 is enabled by default whenever HTTP/2 is allowed and + // the account advertises a thin-client endpoint. The flag uses a + // negative-term name so that the absence of an opt-in is the on + // state; operators disable Gateway 2.0 by setting this to `true`. + // There is intentionally no `AZURE_COSMOS_*` env var that toggles + // Gateway 2.0 — the override must be applied programmatically. + let explicit_disabled = self.gateway20_disabled.unwrap_or(false); // HTTP/2 is a hard prerequisite for Gateway 2.0 — when HTTP/2 is off // the pool is effectively gateway20-disabled regardless of the flag. let effective_gateway20_disabled = explicit_disabled || !effective_is_http2_allowed; @@ -834,7 +837,8 @@ mod tests { Duration::from_millis(65_000) ); assert!(options.is_http2_allowed()); - assert!(options.gateway20_disabled()); + // Gateway 2.0 is enabled by default whenever HTTP/2 is allowed. + assert!(!options.gateway20_disabled()); assert_eq!( options.emulator_server_cert_validation(), EmulatorServerCertValidation::Enabled From 31d2e6456c07dc38454688991af1b28d137a2431 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:05:56 -0700 Subject: [PATCH 40/48] Consolidate Gateway 2.0 live tests into main Cosmos pipeline MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The standalone `sdk/cosmos/ci-gateway20.yml` pipeline is removed. The Gateway 2.0 ("thin client") live tests now run as a second `LiveTestMatrixConfigs` entry on the main `sdk/cosmos/ci.yml` pipeline. This mirrors how the Java Cosmos SDK plumbs its thin-client live tests in `sdk/cosmos/tests.yml` (Azure/azure-sdk-for-java). The new entry (`Cosmos_gateway20_live_test`) points at `live-gateway20-matrix.json` and reuses the same `azure-sdk-tests-cosmos` service connection. Two pipeline-level `EnvVars` are wired in so the tests can connect to a pre-provisioned thin-client account that is not created per-pipeline-run: * `AZURE_COSMOS_GW20_ENDPOINT` ← `$(thinclient-test-endpoint)` * `AZURE_COSMOS_GW20_KEY` ← `$(thinclient-test-key)` (NOTE: those secret variable names follow Java's convention. They may need to be renamed to whatever the Cosmos service connection actually exposes; this can be verified on the next live-test run.) The matrix machinery still requires `ArmTemplateParameters`, so the deploy step continues to create a throwaway account; the Gateway 2.0 tests just ignore it and connect via the env vars instead. The existing test-category gating (`gateway20` / `gateway20_multi_region`) flows through the bicep template into `COSMOS_RUSTFLAGS`, which gates the `#[cfg(test_category="gateway20")]` test functions — independent of the account the tests connect to. Doc comments in `gateway20_e2e.rs` and the `GATEWAY_20_SPEC.md` file-changes table are updated to reference the consolidated pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../tests/emulator_tests/gateway20_e2e.rs | 4 +- .../docs/GATEWAY_20_SPEC.md | 3 +- sdk/cosmos/ci-gateway20.yml | 52 ------------------- sdk/cosmos/ci.yml | 26 ++++++++++ 4 files changed, 31 insertions(+), 54 deletions(-) delete mode 100644 sdk/cosmos/ci-gateway20.yml diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index 6d6b996d28e..323b0fe66d1 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -8,7 +8,9 @@ //! account. The endpoint and primary key are read from the //! `AZURE_COSMOS_GW20_ENDPOINT` and `AZURE_COSMOS_GW20_KEY` environment //! variables and gated by the `gateway20` test category. They are skipped by -//! default; the dedicated `ci-gateway20.yml` pipeline sets the matrix entry's +//! default; the main Cosmos Rust pipeline (`sdk/cosmos/ci.yml`) injects those +//! env vars from the `azure-sdk-tests-cosmos` service connection's secret +//! variable group, and the `Cosmos_gateway20_live_test` matrix entry sets the //! `testCategory` to `gateway20` (or `gateway20_multi_region`) so the tests //! run in CI against the live account. //! diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 93e2cb419e8..09ffefff4b7 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -561,7 +561,8 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway | Action | File | Purpose | | --- | --- | --- | -| NEW | `sdk/cosmos/ci-gateway20.yml` | Gateway 2.0 live tests pipeline definition (uses pre-provisioned account) | +| EDIT | `sdk/cosmos/ci.yml` | Add a second `LiveTestMatrixConfigs` entry (`Cosmos_gateway20_live_test`) that points at `live-gateway20-matrix.json`, plus an `EnvVars` block that injects `AZURE_COSMOS_GW20_ENDPOINT` / `AZURE_COSMOS_GW20_KEY` from the `azure-sdk-tests-cosmos` service connection. Mirrors Java's `sdk/cosmos/tests.yml` thin-client setup. | +| NEW | `sdk/cosmos/live-gateway20-matrix.json` | Gateway 2.0 live test matrix (single-region + multi-region; `testCategory` = `gateway20` / `gateway20_multi_region`). The pre-provisioned account is supplied via the env vars above; the matrix's `ArmTemplateParameters` block is preserved so the deploy step still runs even though the per-run account is unused. | | EDIT | `sdk/cosmos/live-platform-matrix.json` | Add gateway 2.0 test matrix entry | #### Test Coverage Matrix diff --git a/sdk/cosmos/ci-gateway20.yml b/sdk/cosmos/ci-gateway20.yml deleted file mode 100644 index 18ee0c4b47d..00000000000 --- a/sdk/cosmos/ci-gateway20.yml +++ /dev/null @@ -1,52 +0,0 @@ -# NOTE: Please refer to https://aka.ms/azsdk/engsys/ci-yaml before editing this file. -# -# Gateway 2.0 live-test pipeline. -# -# This pipeline runs the Cosmos client tests against a pre-provisioned Gateway 2.0 -# (a.k.a. "thin client") account. It is a sibling of `ci.yml` and shares the same -# template (`archetype-sdk-client.yml`) but uses a separate matrix -# (`live-gateway20-matrix.json`) so that the matrix entries map onto the dedicated -# test categories `gateway20` and `gateway20_multi_region`. -# -# The Gateway 2.0 account is **not** provisioned per-pipeline-run — it must be -# created out-of-band by the engineering system. The pipeline reads its endpoint -# and key from these pipeline secrets: -# -# AZURE_COSMOS_GW20_ENDPOINT — endpoint URL -# AZURE_COSMOS_GW20_KEY — primary master key -# -# The pipeline triggers only on PRs and manual / scheduled dispatch (no `branches` -# trigger). This avoids accidental runs on every main push while still allowing -# developers to validate Gateway 2.0 changes from a PR. - -trigger: none - -pr: - branches: - include: - - main - - hotfix/* - - release/* - paths: - include: - - sdk/cosmos/ - -parameters: -- name: RunLiveTests - displayName: Run live tests - type: boolean - default: true - -extends: - template: /eng/pipelines/templates/stages/archetype-sdk-client.yml - parameters: - ServiceDirectory: cosmos - RunLiveTests: ${{ parameters.RunLiveTests }} - CloudConfig: - Public: - ServiceConnection: azure-sdk-tests-cosmos - LiveTestMatrixConfigs: - - Name: Cosmos_gateway20_live_test - Path: sdk/cosmos/live-gateway20-matrix.json - Selection: sparse - GenerateVMJobs: true diff --git a/sdk/cosmos/ci.yml b/sdk/cosmos/ci.yml index e12fafc594b..0763902ed97 100644 --- a/sdk/cosmos/ci.yml +++ b/sdk/cosmos/ci.yml @@ -43,6 +43,21 @@ extends: CloudConfig: Public: ServiceConnection: azure-sdk-tests-cosmos + # Endpoint + master key for the pre-provisioned Gateway 2.0 ("thin client") + # account, surfaced from the `azure-sdk-tests-cosmos` service connection's + # secret variable group. Both `LiveTestMatrixConfigs` entries below see + # these env vars at job time; the standard Cosmos live tests ignore them + # while the Gateway 2.0 matrix entry consumes them via the `gateway20` + # test-category scaffolding (see + # `azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs`). + # + # This mirrors the Java Cosmos SDK's thin-client live-test setup + # (`sdk/cosmos/tests.yml` in `Azure/azure-sdk-for-java`), which adds a + # second matrix entry pointing at a pre-provisioned thin-client account + # rather than spinning up a dedicated pipeline. + EnvVars: + AZURE_COSMOS_GW20_ENDPOINT: $(thinclient-test-endpoint) + AZURE_COSMOS_GW20_KEY: $(thinclient-test-key) MatrixConfigs: - Name: Cosmos_release Path: sdk/cosmos/release-platform-matrix.json @@ -54,3 +69,14 @@ extends: Path: sdk/cosmos/live-platform-matrix.json Selection: sparse GenerateVMJobs: true + # Gateway 2.0 ("thin client") live tests run against a pre-provisioned + # account that is NOT created per-pipeline-run. The + # `ArmTemplateParameters` block in `live-gateway20-matrix.json` is kept + # so the deploy step still fires (the matrix machinery requires it), + # but the provisioned account is unused — tests connect to the dedicated + # thin-client account via the `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env + # vars wired above. + - Name: Cosmos_gateway20_live_test + Path: sdk/cosmos/live-gateway20-matrix.json + Selection: sparse + GenerateVMJobs: true From d0109784b9b40197418979ba05d44f7dea67adea Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:17:06 -0700 Subject: [PATCH 41/48] Re-export driver fault-injection types from SDK MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the SDK's parallel fault-injection type system with re-exports from azure_data_cosmos_driver::fault_injection. The duplicate types existed only because the SDK and driver each had their own injection module; with the driver fault-injection now wired into the gateway transport via a shared rule slice, there is no reason to keep two copies. Removed: - sdk/.../fault_injection/{condition,result,rule}.rs (~590 lines) - driver_bridge::sdk_fi_rules_to_driver_fi_rules() and the entire feature-gated translation block (~127 lines) - The dual-state Arc/Arc wiring on the SDK rule — no longer needed because both transports now share the same Arc. - Dead 'passthrough_statuses' tracking on the SDK FaultClient. Kept SDK-owned: - FaultInjectionClientBuilder — produces the gateway-side FaultClient HTTP transport. - A small fault_operation_for_sdk(SdkOpType, SdkResType) adapter so CosmosRequest::add_fault_injection_headers can stamp the right operation tag using the SDK enum (the driver's FaultOperationType::from_operation_and_resource takes driver enums). Other touches: - Promoted driver FaultInjectionRule::increment_hit_count to pub so the SDK FaultClient (separate crate) can call it. - CustomResponse construction in tests now uses CustomResponseBuilder because driver fields are private; accessor calls (status_code(), body(), region(), etc.) replace direct field reads. - Forward-compat: the FaultInjectionErrorType match in apply_fault gets a wildcard arm because the driver enum is #[non_exhaustive]. - Updated sdk-to-driver-cutover.md to describe the post-cutover architecture and drop references to the deleted bridge function. Net diff: -1005 / +144 lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/sdk-to-driver-cutover.md | 55 +--- .../src/clients/cosmos_client_builder.rs | 7 +- .../azure_data_cosmos/src/cosmos_request.rs | 7 +- .../azure_data_cosmos/src/driver_bridge.rs | 128 --------- .../src/fault_injection/client_builder.rs | 2 +- .../src/fault_injection/condition.rs | 104 -------- .../src/fault_injection/http_client.rs | 100 ++++--- .../src/fault_injection/mod.rs | 233 +++++----------- .../src/fault_injection/result.rs | 240 ----------------- .../src/fault_injection/rule.rs | 249 ------------------ .../emulator_tests/cosmos_fault_injection.rs | 2 +- .../tests/framework/mock_account.rs | 16 +- .../src/fault_injection/rule.rs | 6 +- 13 files changed, 144 insertions(+), 1005 deletions(-) delete mode 100644 sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs delete mode 100644 sdk/cosmos/azure_data_cosmos/src/fault_injection/result.rs delete mode 100644 sdk/cosmos/azure_data_cosmos/src/fault_injection/rule.rs diff --git a/sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md b/sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md index bb3b53e0b8b..8d4e3eaf20a 100644 --- a/sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md +++ b/sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md @@ -315,42 +315,24 @@ The gateway pipeline tracked this via `CosmosRequest` (which held the final URL) ## Fault Injection Wiring -When cutting `read_item` over to the driver, the SDK's fault injection tests initially failed because the two execution paths (gateway and driver) have **independent fault injection systems**. This section documents how they were connected. +The SDK no longer ships a parallel fault-injection type system. All fault-injection types — [`FaultInjectionRule`], [`FaultInjectionCondition`], [`FaultInjectionResult`], [`CustomResponse`], [`FaultInjectionErrorType`], [`FaultOperationType`], and the matching builders — are re-exported directly from the driver crate (`azure_data_cosmos_driver::fault_injection`) by `azure_data_cosmos::fault_injection`. The SDK only owns: -### Problem +- [`FaultInjectionClientBuilder`] — produces the `azure_core::http::Transport` that the SDK pipeline plugs in (i.e., a `FaultClient` HTTP client wrapper that evaluates driver rules against in-flight gateway requests). +- A small private `fault_operation_for_sdk(SdkOperationType, SdkResourceType) → Option` adapter so `CosmosRequest::add_fault_injection_headers` can stamp the right operation tag on the outbound headers. -The SDK and driver each have their own fault injection module (`azure_data_cosmos::fault_injection` and `azure_data_cosmos_driver::fault_injection`). They define parallel but separate types (`FaultInjectionRule`, `FaultInjectionCondition`, `FaultInjectionResult`, etc.) with identical variants but different Rust types. Prior to this work, only the gateway pipeline received fault injection rules — the driver was built without them. - -### Solution: Rule Translation with Shared State - -The bridge module (`driver_bridge.rs`) includes `sdk_fi_rules_to_driver_fi_rules()`, which translates SDK fault injection rules into driver fault injection rules. The translation covers: - -- `FaultOperationType` — variant-by-variant match (identical variant names) -- `FaultInjectionErrorType` — variant-by-variant match -- `FaultInjectionCondition` — `RegionName` → `Region`, operation type and container ID mapped directly -- `FaultInjectionResult` — `Duration` → `Option`, probability copied -- Timing fields — `start_time: Instant` → `Option`, `end_time` and `hit_limit` copied - -### Shared Mutable State - -SDK `FaultInjectionRule` has `enabled: Arc` and `hit_count: Arc` that tests mutate at runtime (`.disable()`, `.enable()`, `.hit_count()`). The driver's `FaultInjectionRuleBuilder` accepts external `Arc`s via `with_shared_state()`, so both the SDK gateway path and the driver path reference the **same atomic state**. This means: - -- Calling `.disable()` on the SDK rule also disables it in the driver -- Hit counts are shared — both paths increment the same counter -- Tests that toggle rules or assert hit counts work correctly across both paths +Because both transports (gateway and driver) consume the **same** `Arc` instances now, there is no translation step and no shared-state plumbing — toggling `enable()`/`disable()`, hit-count increments, and `hit_limit` enforcement all happen against one canonical rule object. ### Wiring in `CosmosClientBuilder` In `CosmosClientBuilder::build()`: -1. Before the `FaultInjectionClientBuilder` is consumed for the gateway transport, `rules()` extracts a reference to the SDK rules -2. `sdk_fi_rules_to_driver_fi_rules()` translates them to driver rules with shared state -3. The translated rules are passed to `CosmosDriverRuntimeBuilder::with_fault_injection_rules()` -4. The SDK's `fault_injection` Cargo feature now forwards to the driver's `fault_injection` feature +1. The `FaultInjectionClientBuilder::rules()` accessor returns `&[Arc]` — already the driver type, so the SDK simply clones the slice (`fault_builder.rules().to_vec()`). +2. The cloned rules are passed to `CosmosDriverRuntimeBuilder::with_fault_injection_rules()` so the driver's own fault-injection HTTP client can evaluate them. +3. The `FaultInjectionClientBuilder` is then consumed to build the gateway transport, which wraps the inner `HttpClient` with a `FaultClient` that evaluates the same rules. ### Test Patterns for Future Cutover -When cutting over additional operations, **no additional fault injection wiring is needed** — it's handled once at the `CosmosClientBuilder` level. However, tests need to account for two behavioral differences: +When cutting over additional operations, **no fault-injection wiring changes are needed** — it's all wired once at `CosmosClientBuilder::build()`. However, tests need to account for two behavioral differences between gateway-routed and driver-routed operations: **`request_url()` returns `None` for driver-routed operations:** @@ -378,24 +360,7 @@ let rule = FaultInjectionRuleBuilder::new("test", error) This asymmetry will disappear once all operations are driver-routed, since there will be only one hit-counting path. -### `custom_response` Translation - -Translation of `CustomResponse` (synthetic HTTP responses) is not yet implemented. None of the current tests use custom responses for `ReadItem` operations. When needed, the bridge function should be extended to translate `CustomResponse` fields (`status_code`, `headers`, `body`). - -### Consolidating to Driver Fault Injection After Cutover - -The current dual-system architecture (SDK fault injection + driver fault injection + translation bridge) exists only because the cutover is incremental — some operations still go through the gateway while others go through the driver. Once **all** operations are routed through the driver: - -1. **Drop `azure_data_cosmos::fault_injection`** — the SDK's HTTP-client-level fault interception module becomes unreachable. Delete the entire `src/fault_injection/` directory. -2. **Re-export driver types** — the SDK re-exports the driver's fault injection types directly: - - ```rust - #[cfg(feature = "fault_injection")] - pub use azure_data_cosmos_driver::fault_injection; - ``` +### Final State After Cutover -3. **Remove the translation layer** — `sdk_fi_rules_to_driver_fi_rules()` in `driver_bridge.rs` and the `shared_enabled()`/`shared_hit_count()` accessors on the SDK rule are no longer needed. -4. **Simplify `CosmosClientBuilder`** — `with_fault_injection()` accepts `Vec>` directly and passes them to `CosmosDriverRuntimeBuilder::with_fault_injection_rules()`. No translation, no cloning, no intermediary builder. -5. **Update tests** — tests construct driver `FaultInjectionRule` directly (same builders, same API) instead of SDK rules. +Once **all** operations are routed through the driver, the SDK-side `FaultInjectionClientBuilder` and `FaultClient` HTTP wrapper become unreachable too — the driver-runtime fault-injection HTTP client is the single source of truth. At that point `azure_data_cosmos::fault_injection` collapses into a pure `pub use azure_data_cosmos_driver::fault_injection;` re-export (or is dropped entirely). -At that point the SDK has **no fault injection logic of its own** — it's a pass-through to the driver, matching the overall "SDK as thin wrapper" goal. The driver is the single source of truth for all transport-related concerns including fault injection. diff --git a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs index 13047835266..48fc04abbf1 100644 --- a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs @@ -331,9 +331,10 @@ impl CosmosClientBuilder { Option, Vec>, ) = if let Some(fault_builder) = self.fault_injection_builder { - // Translate rules for the driver before the builder is consumed. - let driver_rules = - crate::driver_bridge::sdk_fi_rules_to_driver_fi_rules(fault_builder.rules()); + // SDK fault-injection rules are now driver `FaultInjectionRule`s + // (re-exported through `crate::fault_injection`), so the driver + // can consume them directly without a translation step. + let driver_rules = fault_builder.rules().to_vec(); let fault_builder = match base_client { Some(client) => fault_builder.with_inner_client(client), None => fault_builder, diff --git a/sdk/cosmos/azure_data_cosmos/src/cosmos_request.rs b/sdk/cosmos/azure_data_cosmos/src/cosmos_request.rs index 4ee5e0f1a59..c56782dd66d 100644 --- a/sdk/cosmos/azure_data_cosmos/src/cosmos_request.rs +++ b/sdk/cosmos/azure_data_cosmos/src/cosmos_request.rs @@ -2,7 +2,7 @@ // Licensed under the MIT License. #[cfg(feature = "fault_injection")] -use crate::fault_injection::FaultOperationType; +use crate::fault_injection::fault_operation_for_sdk; use crate::operation_context::OperationType; use crate::options::ExcludedRegions; use crate::request_context::RequestContext; @@ -153,10 +153,7 @@ impl CosmosRequest { #[cfg(feature = "fault_injection")] pub fn add_fault_injection_headers(&mut self) { - let fault_op = FaultOperationType::from_operation_and_resource( - &self.operation_type, - &self.resource_type, - ); + let fault_op = fault_operation_for_sdk(&self.operation_type, &self.resource_type); if let Some(op) = fault_op { self.headers.insert( diff --git a/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs b/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs index 8e83734f694..31c1cc400c8 100644 --- a/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs +++ b/sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs @@ -94,134 +94,6 @@ fn driver_response_headers_to_headers(cosmos_headers: &CosmosResponseHeaders) -> headers } -/// Translates SDK fault injection rules into driver fault injection rules. -/// -/// The `enabled` and `hit_count` state is shared between the SDK and driver -/// rules via `Arc`, so toggling a rule in tests affects both paths. -#[cfg(feature = "fault_injection")] -pub(crate) fn sdk_fi_rules_to_driver_fi_rules( - sdk_rules: &[std::sync::Arc], -) -> Vec> { - use crate::fault_injection::{ - FaultInjectionErrorType as SdkErrorType, FaultOperationType as SdkOpType, - }; - use azure_data_cosmos_driver::fault_injection::{ - self as driver_fi, FaultInjectionConditionBuilder as DriverConditionBuilder, - FaultInjectionResultBuilder as DriverResultBuilder, - FaultInjectionRuleBuilder as DriverRuleBuilder, - }; - use azure_data_cosmos_driver::options::Region; - - sdk_rules - .iter() - .map(|sdk_rule| { - // Translate condition - let mut cond_builder = DriverConditionBuilder::new(); - if let Some(op) = &sdk_rule.condition.operation_type { - let driver_op = match op { - SdkOpType::ReadItem => driver_fi::FaultOperationType::ReadItem, - SdkOpType::QueryItem => driver_fi::FaultOperationType::QueryItem, - SdkOpType::CreateItem => driver_fi::FaultOperationType::CreateItem, - SdkOpType::UpsertItem => driver_fi::FaultOperationType::UpsertItem, - SdkOpType::ReplaceItem => driver_fi::FaultOperationType::ReplaceItem, - SdkOpType::DeleteItem => driver_fi::FaultOperationType::DeleteItem, - SdkOpType::PatchItem => driver_fi::FaultOperationType::PatchItem, - SdkOpType::BatchItem => driver_fi::FaultOperationType::BatchItem, - SdkOpType::ChangeFeedItem => driver_fi::FaultOperationType::ChangeFeedItem, - SdkOpType::MetadataReadContainer => { - driver_fi::FaultOperationType::MetadataReadContainer - } - SdkOpType::MetadataReadDatabaseAccount => { - driver_fi::FaultOperationType::MetadataReadDatabaseAccount - } - SdkOpType::MetadataQueryPlan => { - driver_fi::FaultOperationType::MetadataQueryPlan - } - SdkOpType::MetadataPartitionKeyRanges => { - driver_fi::FaultOperationType::MetadataPartitionKeyRanges - } - }; - cond_builder = cond_builder.with_operation_type(driver_op); - } - if let Some(region) = &sdk_rule.condition.region { - cond_builder = cond_builder.with_region(Region::new(region.to_string())); - } - if let Some(container_id) = &sdk_rule.condition.container_id { - cond_builder = cond_builder.with_container_id(container_id.clone()); - } - if let Some(transport_kind) = sdk_rule.condition.transport_kind { - // SDK and driver share the same `TransportKind` type - // (the SDK re-exports it from the driver), so no enum - // conversion is required. - cond_builder = cond_builder.with_transport_kind(transport_kind); - } - - // Translate result - let mut result_builder = DriverResultBuilder::new(); - if let Some(err) = &sdk_rule.result.error_type { - let driver_err = match err { - SdkErrorType::InternalServerError => { - driver_fi::FaultInjectionErrorType::InternalServerError - } - SdkErrorType::TooManyRequests => { - driver_fi::FaultInjectionErrorType::TooManyRequests - } - SdkErrorType::ReadSessionNotAvailable => { - driver_fi::FaultInjectionErrorType::ReadSessionNotAvailable - } - SdkErrorType::Timeout => driver_fi::FaultInjectionErrorType::Timeout, - SdkErrorType::ServiceUnavailable => { - driver_fi::FaultInjectionErrorType::ServiceUnavailable - } - SdkErrorType::PartitionIsGone => { - driver_fi::FaultInjectionErrorType::PartitionIsGone - } - SdkErrorType::WriteForbidden => { - driver_fi::FaultInjectionErrorType::WriteForbidden - } - SdkErrorType::DatabaseAccountNotFound => { - driver_fi::FaultInjectionErrorType::DatabaseAccountNotFound - } - SdkErrorType::ConnectionError => { - driver_fi::FaultInjectionErrorType::ConnectionError - } - SdkErrorType::ResponseTimeout => { - driver_fi::FaultInjectionErrorType::ResponseTimeout - } - }; - result_builder = result_builder.with_error(driver_err); - } - if sdk_rule.result.delay > std::time::Duration::ZERO { - result_builder = result_builder.with_delay(sdk_rule.result.delay); - } - let prob = sdk_rule.result.probability(); - if prob < 1.0 { - result_builder = result_builder.with_probability(prob); - } - // Note: custom_response translation is skipped for now. - // None of the current failing tests use custom responses. - - // Build driver rule with shared state - let mut rule_builder = - DriverRuleBuilder::new(sdk_rule.id.clone(), result_builder.build()) - .with_condition(cond_builder.build()) - .with_shared_state(sdk_rule.shared_enabled(), sdk_rule.shared_hit_count()); - - if let Some(end_time) = sdk_rule.end_time { - rule_builder = rule_builder.with_end_time(end_time); - } - if let Some(hit_limit) = sdk_rule.hit_limit { - rule_builder = rule_builder.with_hit_limit(hit_limit); - } - // SDK start_time is always set (Instant::now() by default). - // Driver start_time is Option. - rule_builder = rule_builder.with_start_time(sdk_rule.start_time); - - std::sync::Arc::new(rule_builder.build()) - }) - .collect() -} - #[cfg(test)] mod tests { use super::*; diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/client_builder.rs index 6ed1d330f90..97c5b9571f3 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/client_builder.rs @@ -8,7 +8,7 @@ use std::sync::Arc; use azure_core::http::Transport; use super::http_client::FaultClient; -use super::rule::FaultInjectionRule; +use super::FaultInjectionRule; /// Builder for creating a fault injection client. /// diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs deleted file mode 100644 index 0dff61ff2de..00000000000 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/condition.rs +++ /dev/null @@ -1,104 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -//! Defines conditions for when fault injection rules should be applied. - -use super::{FaultOperationType, TransportKind}; -use crate::regions::Region; - -/// Defines the condition under which a fault injection rule should be applied. -#[derive(Clone, Default, Debug)] -pub struct FaultInjectionCondition { - /// The type of operation to which the fault injection applies. - pub operation_type: Option, - /// The region to which the fault injection applies. - pub region: Option, - /// The container ID to which the fault injection applies. - pub container_id: Option, - /// Restricts the rule to a specific transport kind (Gateway 1.x vs - /// Gateway 2.0). When `None`, the rule applies regardless of which - /// dataplane transport carries the request. When `Some`, the rule - /// only applies to clients bound to that transport — metadata - /// clients always skip the rule. - pub transport_kind: Option, -} - -/// Builder for creating a FaultInjectionCondition. -#[derive(Default)] -pub struct FaultInjectionConditionBuilder { - operation_type: Option, - region: Option, - container_id: Option, - transport_kind: Option, -} - -impl FaultInjectionConditionBuilder { - /// Creates a new FaultInjectionConditionBuilder with default values. - pub fn new() -> Self { - Self { - operation_type: None, - region: None, - container_id: None, - transport_kind: None, - } - } - - /// Sets the operation type to which the fault injection applies. - pub fn with_operation_type(mut self, operation_type: FaultOperationType) -> Self { - self.operation_type = Some(operation_type); - self - } - - /// Sets the region to which the fault injection applies. - pub fn with_region(mut self, region: Region) -> Self { - self.region = Some(region); - self - } - - /// Sets the container ID to which the fault injection applies. - pub fn with_container_id(mut self, container_id: impl Into) -> Self { - self.container_id = Some(container_id.into()); - self - } - - /// Restricts the rule to a specific transport kind (e.g., - /// [`TransportKind::Gateway20`]). When set, the rule only matches - /// requests carried by the matching transport. - pub fn with_transport_kind(mut self, transport_kind: TransportKind) -> Self { - self.transport_kind = Some(transport_kind); - self - } - - /// Builds the FaultInjectionCondition. - pub fn build(self) -> FaultInjectionCondition { - FaultInjectionCondition { - operation_type: self.operation_type, - region: self.region, - container_id: self.container_id, - transport_kind: self.transport_kind, - } - } -} - -#[cfg(test)] -mod tests { - use super::{FaultInjectionConditionBuilder, TransportKind}; - - #[test] - fn builder_default() { - let builder = FaultInjectionConditionBuilder::default(); - let condition = builder.build(); - assert!(condition.operation_type.is_none()); - assert!(condition.region.is_none()); - assert!(condition.container_id.is_none()); - assert!(condition.transport_kind.is_none()); - } - - #[test] - fn with_transport_kind_sets_field() { - let condition = FaultInjectionConditionBuilder::new() - .with_transport_kind(TransportKind::Gateway20) - .build(); - assert_eq!(condition.transport_kind, Some(TransportKind::Gateway20)); - } -} diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/http_client.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/http_client.rs index 1a18e20e196..55887c012a3 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/http_client.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/http_client.rs @@ -3,10 +3,9 @@ // cSpell:ignore evals -use super::result::FaultInjectionResult; -use super::rule::FaultInjectionRule; -use super::FaultInjectionErrorType; -use super::FaultOperationType; +use super::{ + FaultInjectionErrorType, FaultInjectionResult, FaultInjectionRule, FaultOperationType, +}; use crate::constants::{self, SubStatusCode}; use async_trait::async_trait; use azure_core::error::ErrorKind; @@ -43,20 +42,22 @@ impl FaultClient { return false; } - // Check if the rule has started - if now < rule.start_time { - return false; + // Check if the rule has started (driver default = always-active) + if let Some(start) = rule.start_time() { + if now < start { + return false; + } } // Check if the rule has expired - if let Some(end_time) = rule.end_time { + if let Some(end_time) = rule.end_time() { if now >= end_time { return false; } } // Check if we've exceeded the hit limit on the rule - if let Some(hit_limit) = rule.hit_limit { + if let Some(hit_limit) = rule.hit_limit() { if rule.hit_count() >= hit_limit { return false; } @@ -67,11 +68,11 @@ impl FaultClient { /// Checks if the request matches the rule's condition. fn matches_condition(&self, request: &Request, rule: &FaultInjectionRule) -> bool { - let condition = &rule.condition; + let condition = rule.condition(); let mut matches = true; // Check operation type if specified - if let Some(expected_op) = condition.operation_type { + if let Some(expected_op) = condition.operation_type() { let request_op = request .headers() .get_optional_str(&constants::FAULT_INJECTION_OPERATION) @@ -89,14 +90,14 @@ impl FaultClient { } // Check region if specified - if let Some(region) = &condition.region { + if let Some(region) = condition.region() { if !request.url().as_str().contains(region.as_str()) { matches = false; } } // Check container ID if specified - if let Some(container_id) = &condition.container_id { + if let Some(container_id) = condition.container_id() { if !request.url().as_str().contains(container_id) { matches = false; } @@ -127,16 +128,16 @@ impl FaultClient { } // Check for custom response first (takes precedence over error injection) - if let Some(ref custom) = server_error.custom_response { + if let Some(custom) = server_error.custom_response() { return Some(Ok(AsyncRawResponse::from_bytes( - custom.status_code, - custom.headers.clone(), - custom.body.clone(), + custom.status_code(), + custom.headers().clone(), + custom.body().to_vec(), ))); } // Generate the appropriate error based on error type - let error_type = match server_error.error_type { + let error_type = match server_error.error_type() { Some(et) => et, None => return None, // No error type set, pass through }; @@ -196,6 +197,14 @@ impl FaultClient { Some(SubStatusCode::DATABASE_ACCOUNT_NOT_FOUND), "Database Account Not Found - Injected fault", ), + // The driver enum is `#[non_exhaustive]`; new variants + // surface as a generic injected service-unavailable until the + // SDK is taught to render them. + _ => ( + StatusCode::ServiceUnavailable, + None, + "Unknown injected fault", + ), }; let raw_response = sub_status.map(|ss| { @@ -221,10 +230,7 @@ impl FaultClient { impl HttpClient for FaultClient { async fn execute_request(&self, request: &Request) -> azure_core::Result { // Find applicable rule and clone the result if needed - let (fault_result, matched_rule): ( - Option, - Option>, - ) = { + let fault_result: Option = { let rules = self.rules.lock().unwrap(); let mut applicable_rule_index: Option = None; @@ -239,9 +245,9 @@ impl HttpClient for FaultClient { if let Some(index) = applicable_rule_index { let rule = &rules[index]; rule.increment_hit_count(); - (Some(rule.result.clone()), Some(Arc::clone(rule))) + Some(rule.result().clone()) } else { - (None, None) + None } }; @@ -262,33 +268,19 @@ impl HttpClient for FaultClient { .remove(constants::FAULT_INJECTION_OPERATION); // No fault injection or delay-only fault, proceed with actual request - let result = self.inner.execute_request(&clean_request).await; - - // Record response status only for true spy rules: no error_type, - // no custom_response, and no delay. This excludes probability-skipped - // faults and any rule that injected a delay. - if let (Some(rule), Some(ref fr), Ok(ref response)) = - (&matched_rule, &fault_result, &result) - { - if fr.error_type.is_none() - && fr.custom_response.is_none() - && fr.delay == Duration::ZERO - { - rule.record_passthrough_status(response.status()); - } - } - - result + self.inner.execute_request(&clean_request).await }; // Apply delay after the request is sent if let Some(result) = fault_result { - if result.delay > Duration::ZERO { - let delay = azure_core::time::Duration::try_from(result.delay) - .unwrap_or(azure_core::time::Duration::ZERO); - azure_core::async_runtime::get_async_runtime() - .sleep(delay) - .await; + if let Some(delay) = result.delay() { + if delay > Duration::ZERO { + let delay = azure_core::time::Duration::try_from(delay) + .unwrap_or(azure_core::time::Duration::ZERO); + azure_core::async_runtime::get_async_runtime() + .sleep(delay) + .await; + } } } @@ -301,13 +293,13 @@ mod tests { use super::FaultClient; use crate::constants::{SubStatusCode, SUB_STATUS}; use crate::fault_injection::{ - CustomResponse, FaultInjectionConditionBuilder, FaultInjectionErrorType, + CustomResponseBuilder, FaultInjectionConditionBuilder, FaultInjectionErrorType, FaultInjectionResultBuilder, FaultInjectionRuleBuilder, FaultOperationType, }; use crate::regions::Region; use async_trait::async_trait; use azure_core::error::ErrorKind; - use azure_core::http::{headers::Headers, AsyncRawResponse, HttpClient, Method, Request, Url}; + use azure_core::http::{AsyncRawResponse, HttpClient, Method, Request, Url}; use std::sync::atomic::{AtomicU32, Ordering}; use std::sync::Arc; use std::time::{Duration, Instant}; @@ -730,11 +722,11 @@ mod tests { let body = b"{\"id\": \"test-account\"}".to_vec(); let result = FaultInjectionResultBuilder::new() - .with_custom_response(CustomResponse { - status_code: azure_core::http::StatusCode::Ok, - headers: Headers::new(), - body: body.clone(), - }) + .with_custom_response( + CustomResponseBuilder::new(azure_core::http::StatusCode::Ok) + .with_body(body.clone()) + .build(), + ) .build(); let rule = FaultInjectionRuleBuilder::new("custom-response-rule", result).build(); diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs index cec36071201..85be04b9f9f 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs @@ -3,9 +3,18 @@ //! Fault injection framework for testing Cosmos DB client behavior under error conditions. //! -//! This module provides a fault injection framework that intercepts HTTP requests at the -//! transport layer, below the retry policy. When a fault is injected, it triggers the same -//! retry and failover behavior as a real service error. This enables testing of: +//! This module wraps the driver's fault-injection primitives — every type +//! except [`FaultInjectionClientBuilder`] is re-exported directly from +//! [`azure_data_cosmos_driver::fault_injection`]. The SDK only owns the +//! [`FaultInjectionClientBuilder`] (which produces an [`azure_core::http::Transport`] +//! that the SDK pipeline plugs in) and a small adapter for translating SDK-side +//! [`OperationType`](crate::operation_context::OperationType) / +//! [`ResourceType`](crate::resource_context::ResourceType) pairs into the +//! driver's [`FaultOperationType`]. +//! +//! Below the transport layer, fault injection intercepts HTTP requests and +//! triggers the same retry and failover behavior as a real service error. +//! It enables testing of: //! //! - Error handling for various HTTP status codes (503, 500, 429, 408, etc.) //! - Retry logic and backoff behavior @@ -27,7 +36,7 @@ //! configured builder to [`CosmosClientBuilder::with_fault_injection()`](crate::CosmosClientBuilder::with_fault_injection) //! to enable fault injection and wrap the HTTP transport with a fault-injecting client. //! - [`FaultInjectionCondition`] — Defines when a fault should be applied, filtering by -//! operation type, region, or container ID. +//! operation type, region, container ID, or transport kind. //! - [`FaultInjectionResult`] — Defines what error to inject, including error type, delay, //! and probability. //! - [`FaultInjectionRule`] — Combines a condition with a result and additional controls @@ -92,26 +101,18 @@ //! Rules are evaluated in the order they were added. The first matching rule is applied. //! All specified conditions in a [`FaultInjectionCondition`] must match (AND logic): //! if no conditions are specified, the rule matches all requests. -//! mod client_builder; -mod condition; mod http_client; -mod result; -mod rule; - -use std::fmt; -use std::str::FromStr; - -use crate::operation_context::OperationType; -use crate::resource_context::ResourceType; pub use client_builder::FaultInjectionClientBuilder; -pub use condition::{FaultInjectionCondition, FaultInjectionConditionBuilder}; -pub use result::{ - CustomResponse, CustomResponseBuilder, FaultInjectionResult, FaultInjectionResultBuilder, + +#[doc(inline)] +pub use azure_data_cosmos_driver::fault_injection::{ + CustomResponse, CustomResponseBuilder, FaultInjectionCondition, FaultInjectionConditionBuilder, + FaultInjectionErrorType, FaultInjectionResult, FaultInjectionResultBuilder, FaultInjectionRule, + FaultInjectionRuleBuilder, FaultOperationType, }; -pub use rule::{FaultInjectionRule, FaultInjectionRuleBuilder}; /// Re-export of the driver's [`TransportKind`](azure_data_cosmos_driver::diagnostics::TransportKind) /// enum so SDK consumers can scope fault-injection rules to a specific @@ -119,156 +120,58 @@ pub use rule::{FaultInjectionRule, FaultInjectionRuleBuilder}; /// driver crate directly. pub use azure_data_cosmos_driver::diagnostics::TransportKind; -/// Represents different server error types that can be injected for fault testing. -#[derive(Clone, Copy, Debug, PartialEq, Eq)] -pub enum FaultInjectionErrorType { - /// 500 from server. - InternalServerError, - /// 429 from server. - TooManyRequests, - /// 404-1002 from server. - ReadSessionNotAvailable, - /// 408 from server. - Timeout, - /// Simulate service unavailable (503). - ServiceUnavailable, - /// 410-1002 from server. - PartitionIsGone, - /// 403-3 Forbidden from server. - WriteForbidden, - /// 403-1008 Forbidden from server. - DatabaseAccountNotFound, - /// Simulates a connection failure (e.g., connection refused, DNS failure). - /// Produces an `ErrorKind::Io` error, not an HTTP response error. - ConnectionError, - /// Simulates a response timeout (request sent but no response received). - /// Produces an `ErrorKind::Io` error, not an HTTP response error. - ResponseTimeout, -} - -/// The type of operation to which the fault injection applies. -#[derive(Clone, Copy, Debug, PartialEq, Eq)] -pub enum FaultOperationType { - /// Read items. - ReadItem, - /// Query items. - QueryItem, - /// Create item. - CreateItem, - /// Upsert item. - UpsertItem, - /// Replace item. - ReplaceItem, - /// Delete item. - DeleteItem, - /// Patch item. - PatchItem, - /// Batch item. - BatchItem, - /// Read change feed items. - ChangeFeedItem, - /// Read container request. - MetadataReadContainer, - /// Read database account request. - MetadataReadDatabaseAccount, - /// Query query plan request. - MetadataQueryPlan, - /// Partition key ranges request. - MetadataPartitionKeyRanges, -} +use crate::operation_context::OperationType as SdkOperationType; +use crate::resource_context::ResourceType as SdkResourceType; -impl FaultOperationType { - /// Returns the string representation of this operation type. - pub fn as_str(&self) -> &'static str { - match self { - FaultOperationType::ReadItem => "ReadItem", - FaultOperationType::QueryItem => "QueryItem", - FaultOperationType::CreateItem => "CreateItem", - FaultOperationType::UpsertItem => "UpsertItem", - FaultOperationType::ReplaceItem => "ReplaceItem", - FaultOperationType::DeleteItem => "DeleteItem", - FaultOperationType::PatchItem => "PatchItem", - FaultOperationType::BatchItem => "BatchItem", - FaultOperationType::ChangeFeedItem => "ChangeFeedItem", - FaultOperationType::MetadataReadContainer => "MetadataReadContainer", - FaultOperationType::MetadataReadDatabaseAccount => "MetadataReadDatabaseAccount", - FaultOperationType::MetadataQueryPlan => "MetadataQueryPlan", - FaultOperationType::MetadataPartitionKeyRanges => "MetadataPartitionKeyRanges", +/// Maps an SDK-side `(OperationType, ResourceType)` pair to the driver's +/// [`FaultOperationType`]. +/// +/// This mirrors `FaultOperationType::from_operation_and_resource` on the +/// driver, but takes SDK enums directly so SDK callers don't need to convert +/// to driver enums first. Returns `None` if the combination doesn't map to a +/// known fault operation type. +pub(crate) fn fault_operation_for_sdk( + operation_type: &SdkOperationType, + resource_type: &SdkResourceType, +) -> Option { + match (operation_type, resource_type) { + (SdkOperationType::Read, SdkResourceType::Documents) => Some(FaultOperationType::ReadItem), + (SdkOperationType::Query, SdkResourceType::Documents) => { + Some(FaultOperationType::QueryItem) } - } - - /// Converts an operation type and resource type pair into a fault injection operation type. - /// - /// Returns `None` if the combination does not map to a known fault operation type. - pub fn from_operation_and_resource( - operation_type: &OperationType, - resource_type: &ResourceType, - ) -> Option { - match (operation_type, resource_type) { - (OperationType::Read, ResourceType::Documents) => Some(FaultOperationType::ReadItem), - (OperationType::Query, ResourceType::Documents) => Some(FaultOperationType::QueryItem), - (OperationType::Create, ResourceType::Documents) => { - Some(FaultOperationType::CreateItem) - } - (OperationType::Upsert, ResourceType::Documents) => { - Some(FaultOperationType::UpsertItem) - } - (OperationType::Replace, ResourceType::Documents) => { - Some(FaultOperationType::ReplaceItem) - } - (OperationType::Delete, ResourceType::Documents) => { - Some(FaultOperationType::DeleteItem) - } - (OperationType::Patch, ResourceType::Documents) => Some(FaultOperationType::PatchItem), - (OperationType::Batch, ResourceType::Documents) => Some(FaultOperationType::BatchItem), - (OperationType::ReadFeed, ResourceType::Documents) => { - Some(FaultOperationType::ChangeFeedItem) - } - (OperationType::Read, ResourceType::Containers) => { - Some(FaultOperationType::MetadataReadContainer) - } - (OperationType::Read, ResourceType::DatabaseAccount) => { - Some(FaultOperationType::MetadataReadDatabaseAccount) - } - (OperationType::QueryPlan, ResourceType::Documents) => { - Some(FaultOperationType::MetadataQueryPlan) - } - (OperationType::ReadFeed, ResourceType::PartitionKeyRanges) => { - Some(FaultOperationType::MetadataPartitionKeyRanges) - } - _ => None, + (SdkOperationType::Create, SdkResourceType::Documents) => { + Some(FaultOperationType::CreateItem) } - } -} - -impl fmt::Display for FaultOperationType { - fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { - f.write_str(self.as_str()) - } -} - -impl FromStr for FaultOperationType { - type Err = (); - - /// Parses a string into a `FaultOperationType`. - /// - /// Returns `Err(())` if the string is not a recognized operation type. - fn from_str(s: &str) -> Result { - match s { - "ReadItem" => Ok(FaultOperationType::ReadItem), - "QueryItem" => Ok(FaultOperationType::QueryItem), - "CreateItem" => Ok(FaultOperationType::CreateItem), - "UpsertItem" => Ok(FaultOperationType::UpsertItem), - "ReplaceItem" => Ok(FaultOperationType::ReplaceItem), - "DeleteItem" => Ok(FaultOperationType::DeleteItem), - "PatchItem" => Ok(FaultOperationType::PatchItem), - "BatchItem" => Ok(FaultOperationType::BatchItem), - "ChangeFeedItem" => Ok(FaultOperationType::ChangeFeedItem), - "MetadataReadContainer" => Ok(FaultOperationType::MetadataReadContainer), - "MetadataReadDatabaseAccount" => Ok(FaultOperationType::MetadataReadDatabaseAccount), - "MetadataQueryPlan" => Ok(FaultOperationType::MetadataQueryPlan), - "MetadataPartitionKeyRanges" => Ok(FaultOperationType::MetadataPartitionKeyRanges), - _ => Err(()), + (SdkOperationType::Upsert, SdkResourceType::Documents) => { + Some(FaultOperationType::UpsertItem) + } + (SdkOperationType::Replace, SdkResourceType::Documents) => { + Some(FaultOperationType::ReplaceItem) + } + (SdkOperationType::Delete, SdkResourceType::Documents) => { + Some(FaultOperationType::DeleteItem) + } + (SdkOperationType::Patch, SdkResourceType::Documents) => { + Some(FaultOperationType::PatchItem) + } + (SdkOperationType::Batch, SdkResourceType::Documents) => { + Some(FaultOperationType::BatchItem) + } + (SdkOperationType::ReadFeed, SdkResourceType::Documents) => { + Some(FaultOperationType::ChangeFeedItem) + } + (SdkOperationType::Read, SdkResourceType::Containers) => { + Some(FaultOperationType::MetadataReadContainer) + } + (SdkOperationType::Read, SdkResourceType::DatabaseAccount) => { + Some(FaultOperationType::MetadataReadDatabaseAccount) + } + (SdkOperationType::QueryPlan, SdkResourceType::Documents) => { + Some(FaultOperationType::MetadataQueryPlan) + } + (SdkOperationType::ReadFeed, SdkResourceType::PartitionKeyRanges) => { + Some(FaultOperationType::MetadataPartitionKeyRanges) } + _ => None, } } diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/result.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/result.rs deleted file mode 100644 index af0db1a8ebd..00000000000 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/result.rs +++ /dev/null @@ -1,240 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -//! Defines fault injection results including server errors. - -use std::time::Duration; - -use azure_core::http::{ - headers::{HeaderName, HeaderValue, Headers}, - StatusCode, -}; - -use crate::constants::SubStatusCode; - -use super::FaultInjectionErrorType; - -/// A synthetic response to return when a fault injection rule matches. -/// -/// Instead of injecting an error, this returns a successful response with -/// the specified status code, headers, and body. Useful for mocking service -/// responses such as `GetDatabaseAccount` in tests. -#[derive(Clone, Debug)] -pub struct CustomResponse { - /// The HTTP status code for the synthetic response. - pub status_code: StatusCode, - /// The headers for the synthetic response. - pub headers: Headers, - /// The body for the synthetic response. - pub body: Vec, -} - -/// Builder for creating a [`CustomResponse`]. -/// -/// Provides a fluent API for constructing synthetic HTTP responses -/// for fault injection testing. -/// -/// # Example -/// -/// ```rust -/// use azure_data_cosmos::fault_injection::CustomResponseBuilder; -/// use azure_core::http::StatusCode; -/// -/// let response = CustomResponseBuilder::new(StatusCode::Forbidden) -/// .with_sub_status(3) -/// .with_body(b"Write Forbidden".to_vec()) -/// .build(); -/// -/// assert_eq!(response.status_code, StatusCode::Forbidden); -/// ``` -pub struct CustomResponseBuilder { - status_code: StatusCode, - headers: Headers, - body: Vec, -} - -impl CustomResponseBuilder { - /// Creates a new builder with the specified HTTP status code. - pub fn new(status_code: StatusCode) -> Self { - Self { - status_code, - headers: Headers::new(), - body: Vec::new(), - } - } - - /// Adds a header to the response. - pub fn with_header( - mut self, - name: impl Into, - value: impl Into, - ) -> Self { - self.headers.insert(name, value); - self - } - - /// Sets the `x-ms-substatus` header to the given numeric sub-status code. - /// - /// This is a convenience method equivalent to calling - /// `with_header("x-ms-substatus", code.to_string())`. - pub fn with_sub_status(self, code: impl Into) -> Self { - let code = code.into(); - self.with_header(crate::constants::SUB_STATUS, code.to_string()) - } - - /// Sets the response body. - pub fn with_body(mut self, body: impl Into>) -> Self { - self.body = body.into(); - self - } - - /// Builds the [`CustomResponse`]. - pub fn build(self) -> CustomResponse { - CustomResponse { - status_code: self.status_code, - headers: self.headers, - body: self.body, - } - } -} - -/// Represents a server error to be injected. -#[derive(Clone, Debug)] -pub struct FaultInjectionResult { - /// The type of server error to inject. - pub error_type: Option, - /// A custom response to return instead of injecting an error. - pub custom_response: Option, - /// Delay before injecting the error. - pub delay: Duration, - /// Probability of injecting the error (0.0 to 1.0). - probability: f32, -} - -impl FaultInjectionResult { - /// Returns the probability of injecting the fault (0.0 to 1.0). - pub fn probability(&self) -> f32 { - self.probability - } -} - -/// Builder for creating a FaultInjectionResult. -pub struct FaultInjectionResultBuilder { - error_type: Option, - custom_response: Option, - delay: Duration, - probability: f32, -} - -impl FaultInjectionResultBuilder { - /// Creates a new FaultInjectionResultBuilder with default values. - pub fn new() -> Self { - Self { - error_type: None, - custom_response: None, - delay: Duration::ZERO, - probability: 1.0, - } - } - - /// Sets the error type to inject. - pub fn with_error(mut self, error_type: FaultInjectionErrorType) -> Self { - self.error_type = Some(error_type); - self - } - - /// Sets a custom response to return instead of injecting an error. - /// - /// When set, the fault injection rule returns this synthetic response - /// rather than forwarding the request to the real service. This takes - /// precedence over `error_type` if both are set. - pub fn with_custom_response(mut self, response: CustomResponse) -> Self { - self.custom_response = Some(response); - self - } - - /// Sets the delay before injecting the error. - pub fn with_delay(mut self, delay: Duration) -> Self { - self.delay = delay; - self - } - - /// Sets the probability of injecting the error (0.0 to 1.0). - pub fn with_probability(mut self, probability: f32) -> Self { - self.probability = probability.clamp(0.0, 1.0); - self - } - - /// Builds the FaultInjectionResult. - /// - pub fn build(self) -> FaultInjectionResult { - FaultInjectionResult { - error_type: self.error_type, - custom_response: self.custom_response, - delay: self.delay, - probability: self.probability, - } - } -} - -impl Default for FaultInjectionResultBuilder { - fn default() -> Self { - Self::new() - } -} - -#[cfg(test)] -mod tests { - use super::{CustomResponse, FaultInjectionResultBuilder}; - use crate::fault_injection::FaultInjectionErrorType; - use azure_core::http::{headers::Headers, StatusCode}; - use std::time::Duration; - - #[test] - fn builder_default_values() { - let error = FaultInjectionResultBuilder::new() - .with_error(FaultInjectionErrorType::Timeout) - .build(); - - assert_eq!(error.error_type.unwrap(), FaultInjectionErrorType::Timeout); - assert_eq!(error.delay, Duration::ZERO); - assert!((error.probability() - 1.0).abs() < f32::EPSILON); - } - - #[test] - fn builder_probability_clamped_above() { - let error = FaultInjectionResultBuilder::new() - .with_error(FaultInjectionErrorType::ServiceUnavailable) - .with_probability(1.5) - .build(); - - assert!((error.probability() - 1.0).abs() < f32::EPSILON); - } - - #[test] - fn builder_probability_clamped_below() { - let error = FaultInjectionResultBuilder::new() - .with_error(FaultInjectionErrorType::ServiceUnavailable) - .with_probability(-0.5) - .build(); - - assert!(error.probability().abs() < f32::EPSILON); - } - - #[test] - fn builder_with_custom_response() { - let body = b"{\"test\": true}".to_vec(); - let result = FaultInjectionResultBuilder::new() - .with_custom_response(CustomResponse { - status_code: StatusCode::Ok, - headers: Headers::new(), - body: body.clone(), - }) - .build(); - - assert!(result.error_type.is_none()); - let custom = result.custom_response.unwrap(); - assert_eq!(custom.status_code, StatusCode::Ok); - assert_eq!(custom.body, body); - } -} diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/rule.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/rule.rs deleted file mode 100644 index 2d87aa8bc50..00000000000 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/rule.rs +++ /dev/null @@ -1,249 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -//! Defines fault injection rules that combine conditions and results. - -use std::sync::atomic::{AtomicBool, AtomicU32, Ordering}; -use std::sync::{Arc, Mutex}; -use std::time::Instant; - -use azure_core::http::StatusCode; - -use super::condition::FaultInjectionCondition; -use super::result::FaultInjectionResult; - -/// A fault injection rule that defines when and how to inject faults. -#[derive(Debug)] -pub struct FaultInjectionRule { - /// The condition under which to inject the fault. - pub condition: FaultInjectionCondition, - /// The result to inject when the condition is met. - pub result: FaultInjectionResult, - /// The absolute time at which the rule becomes active. - pub start_time: Instant, - /// The absolute time at which the rule expires, if set. - pub end_time: Option, - /// The total hit limit of the rule. - pub hit_limit: Option, - /// Unique identifier for the fault injection scenario. - pub id: String, - /// Whether the rule is currently enabled. - enabled: Arc, - /// Number of times the rule has been matched (including matches where no fault was injected). - hit_count: Arc, - /// HTTP status codes of responses for matched requests that passed through without fault injection. - passthrough_statuses: Mutex>, -} - -/// Cloning snapshots the current `hit_count` and `enabled` state rather than -/// resetting them, so a clone of a rule that has been hit 5 times starts at 5. -impl Clone for FaultInjectionRule { - fn clone(&self) -> Self { - Self { - condition: self.condition.clone(), - result: self.result.clone(), - start_time: self.start_time, - end_time: self.end_time, - hit_limit: self.hit_limit, - id: self.id.clone(), - enabled: Arc::new(AtomicBool::new(self.enabled.load(Ordering::SeqCst))), - hit_count: Arc::new(AtomicU32::new(self.hit_count.load(Ordering::SeqCst))), - passthrough_statuses: Mutex::new(self.passthrough_statuses.lock().unwrap().clone()), - } - } -} - -impl FaultInjectionRule { - /// Returns whether the rule is currently enabled. - pub fn is_enabled(&self) -> bool { - self.enabled.load(Ordering::SeqCst) - } - - /// Enables the rule. - pub fn enable(&self) { - self.enabled.store(true, Ordering::SeqCst); - } - - /// Disables the rule. - pub fn disable(&self) { - self.enabled.store(false, Ordering::SeqCst); - } - - /// Returns the number of times this rule has been matched. - /// - /// The hit count is incremented each time the rule's condition matches a - /// request, regardless of whether the fault was actually applied (e.g., - /// probability-based skipping still increments the count). - pub fn hit_count(&self) -> u32 { - self.hit_count.load(Ordering::SeqCst) - } - - /// Increments the hit count by one. - pub(super) fn increment_hit_count(&self) { - self.hit_count.fetch_add(1, Ordering::SeqCst); - } - - /// Resets the hit count to zero. - pub fn reset_hit_count(&self) { - self.hit_count.store(0, Ordering::SeqCst); - } - - /// Returns a shared reference to the enabled flag for cross-path state sharing. - pub(crate) fn shared_enabled(&self) -> Arc { - Arc::clone(&self.enabled) - } - - /// Returns a shared reference to the hit count for cross-path state sharing. - pub(crate) fn shared_hit_count(&self) -> Arc { - Arc::clone(&self.hit_count) - } - - /// Records the HTTP status code of a response for a matched request that - /// passed through without fault injection (spy/passthrough mode). - pub(super) fn record_passthrough_status(&self, status: StatusCode) { - self.passthrough_statuses.lock().unwrap().push(status); - } - - /// Returns the HTTP status codes of responses for matched requests that - /// passed through without fault injection. - /// - /// When a rule matches a request but does not inject a fault (e.g., no - /// `error_type` or `custom_response` is set), the real service response - /// status is recorded here. This enables "spy" rules that observe requests - /// without modifying them. - /// - /// The history grows unbounded for the lifetime of the rule. This is - /// designed for test scenarios with a bounded number of requests. - pub fn passthrough_statuses(&self) -> Vec { - self.passthrough_statuses.lock().unwrap().clone() - } -} - -/// Builder for creating a fault injection rule. -pub struct FaultInjectionRuleBuilder { - /// The condition under which to inject the fault. - condition: FaultInjectionCondition, - /// The result to inject when the condition is met. - result: FaultInjectionResult, - /// The absolute time at which the rule becomes active. - start_time: Instant, - /// The absolute time at which the rule expires. - end_time: Option, - /// The total hit limit of the rule. - hit_limit: Option, - /// Unique identifier for the fault injection scenario. - id: String, -} - -impl FaultInjectionRuleBuilder { - /// Creates a new FaultInjectionRuleBuilder with default values. - /// - /// By default the rule starts immediately and never expires. - pub fn new(id: impl Into, result: FaultInjectionResult) -> Self { - Self { - condition: FaultInjectionCondition::default(), - result, - start_time: Instant::now(), - end_time: None, - hit_limit: None, - id: id.into(), - } - } - - /// Sets the condition for when to inject the fault. - pub fn with_condition(mut self, condition: FaultInjectionCondition) -> Self { - self.condition = condition; - self - } - - /// Sets the result to inject when the condition is met. - pub fn with_result(mut self, result: FaultInjectionResult) -> Self { - self.result = result; - self - } - - /// Sets the absolute time at which the rule becomes active. - pub fn with_start_time(mut self, start_time: Instant) -> Self { - self.start_time = start_time; - self - } - - /// Sets the absolute time at which the rule expires. - pub fn with_end_time(mut self, end_time: Instant) -> Self { - self.end_time = Some(end_time); - self - } - - /// Sets the total hit limit of the rule. - pub fn with_hit_limit(mut self, hit_limit: u32) -> Self { - self.hit_limit = Some(hit_limit); - self - } - - /// Builds the FaultInjectionRule. - pub fn build(self) -> FaultInjectionRule { - FaultInjectionRule { - condition: self.condition, - result: self.result, - start_time: self.start_time, - end_time: self.end_time, - hit_limit: self.hit_limit, - id: self.id, - enabled: Arc::new(AtomicBool::new(true)), - hit_count: Arc::new(AtomicU32::new(0)), - passthrough_statuses: Mutex::new(Vec::new()), - } - } -} - -#[cfg(test)] -mod tests { - use super::FaultInjectionRuleBuilder; - use crate::fault_injection::{FaultInjectionErrorType, FaultInjectionResultBuilder}; - use std::time::Instant; - - fn create_test_error() -> crate::fault_injection::FaultInjectionResult { - FaultInjectionResultBuilder::new() - .with_error(FaultInjectionErrorType::Timeout) - .build() - } - - #[test] - fn builder_default_values() { - let before = Instant::now(); - let rule = FaultInjectionRuleBuilder::new("test-rule", create_test_error()).build(); - - assert_eq!(rule.id, "test-rule"); - assert!(rule.start_time >= before); - assert!(rule.start_time <= Instant::now()); - assert!(rule.end_time.is_none()); - assert!(rule.hit_limit.is_none()); - assert!(rule.condition.operation_type.is_none()); - assert!(rule.is_enabled()); - assert_eq!(rule.hit_count(), 0); - } - - #[test] - fn hit_count_increments() { - let rule = FaultInjectionRuleBuilder::new("hit-test", create_test_error()).build(); - - assert_eq!(rule.hit_count(), 0); - rule.increment_hit_count(); - assert_eq!(rule.hit_count(), 1); - rule.increment_hit_count(); - rule.increment_hit_count(); - assert_eq!(rule.hit_count(), 3); - } - - #[test] - fn reset_hit_count_clears_counter() { - let rule = FaultInjectionRuleBuilder::new("reset-test", create_test_error()).build(); - - rule.increment_hit_count(); - rule.increment_hit_count(); - assert_eq!(rule.hit_count(), 2); - - rule.reset_hit_count(); - assert_eq!(rule.hit_count(), 0); - } -} diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs index 761a069655d..3416657a83a 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs @@ -951,7 +951,7 @@ pub async fn fault_injection_enable_disable_rule() -> Result<(), Box> .build(), ); - assert_eq!(rule.id, "enable-disable-test"); + assert_eq!(rule.id(), "enable-disable-test"); assert!(rule.is_enabled()); let rule_handle = Arc::clone(&rule); diff --git a/sdk/cosmos/azure_data_cosmos/tests/framework/mock_account.rs b/sdk/cosmos/azure_data_cosmos/tests/framework/mock_account.rs index 49fdb7dbad2..90e172ff713 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/framework/mock_account.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/framework/mock_account.rs @@ -4,8 +4,8 @@ //! Helpers for building mock `GetDatabaseAccount` responses in fault injection tests. // cSpell: disable -use azure_core::http::{headers::Headers, StatusCode}; -use azure_data_cosmos::fault_injection::CustomResponse; +use azure_core::http::StatusCode; +use azure_data_cosmos::fault_injection::{CustomResponse, CustomResponseBuilder}; use azure_data_cosmos::regions::Region; /// Builds a [`CustomResponse`] containing a valid `AccountProperties` JSON payload @@ -48,11 +48,9 @@ pub fn mock_database_account_response_for_account( multi_write: bool, ) -> CustomResponse { let body = mock_database_account_json(account_name, writable, readable, multi_write); - CustomResponse { - status_code: StatusCode::Ok, - headers: Headers::new(), - body: body.into_bytes(), - } + CustomResponseBuilder::new(StatusCode::Ok) + .with_body(body.into_bytes()) + .build() } /// Builds a valid `AccountProperties` JSON string with the specified regions. @@ -119,7 +117,7 @@ mod tests { ); let value: serde_json::Value = - serde_json::from_slice(&response.body).expect("should deserialize"); + serde_json::from_slice(response.body()).expect("should deserialize"); let writable = value["writableLocations"].as_array().unwrap(); let readable = value["readableLocations"].as_array().unwrap(); @@ -139,7 +137,7 @@ mod tests { ); let value: serde_json::Value = - serde_json::from_slice(&response.body).expect("should deserialize"); + serde_json::from_slice(response.body()).expect("should deserialize"); assert!(value["enableMultipleWriteLocations"].as_bool().unwrap()); } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/rule.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/rule.rs index b600b008e83..45e8d2164aa 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/rule.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/rule.rs @@ -61,7 +61,11 @@ impl FaultInjectionRule { } /// Increments the hit count by one. - pub(crate) fn increment_hit_count(&self) { + /// + /// This is intended to be called by fault-injection HTTP clients (in the + /// driver and in the Cosmos SDK) when they decide to apply this rule to + /// an in-flight request, so that `hit_limit` can be honoured. + pub fn increment_hit_count(&self) { self.hit_count.fetch_add(1, Ordering::SeqCst); } From b3b5f919459c804d3c6690322e52927d0409e606 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:21:12 -0700 Subject: [PATCH 42/48] Document Gateway 2.0 default and fault-injection refactor in CHANGELOGs Adds entries to both crate CHANGELOGs covering the changes from the prior three commits in this PR: - Driver 0.3.0 (Features Added): Gateway 2.0 transport is now the default; the new `with_gateway20_disabled()` opt-out toggle, with the latency / SLA caveat surfaced explicitly so operators know what they're getting into. - SDK 0.34.0 (Features Added): the matching public `CosmosClientBuilder::with_gateway20_disabled()` builder. - SDK 0.34.0 (Breaking Changes): the fault-injection re-export consolidation. The SDK previously had its own `FaultInjectionRule`/`FaultInjectionCondition`/etc with public fields; field access is now via accessor methods on the driver re-exports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos/CHANGELOG.md | 4 ++++ sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md | 2 ++ 2 files changed, 6 insertions(+) diff --git a/sdk/cosmos/azure_data_cosmos/CHANGELOG.md b/sdk/cosmos/azure_data_cosmos/CHANGELOG.md index 1d0c37f05c6..68bab0acd9f 100644 --- a/sdk/cosmos/azure_data_cosmos/CHANGELOG.md +++ b/sdk/cosmos/azure_data_cosmos/CHANGELOG.md @@ -4,8 +4,12 @@ ### Features Added +- Added `CosmosClientBuilder::with_gateway20_disabled(bool)` to opt out of the new Gateway 2.0 ("thin client") transport, which is now enabled by default. Gateway 2.0 routes data-plane requests through a regional thin-client proxy that forwards RNTBD-over-HTTP/2 to the backend. Set this to `true` to fall back to the direct gateway transport — useful for workloads that depend on the published gateway latency SLAs (Gateway 2.0 is not currently covered by them) or that need the direct-gateway behavior for diagnostics. ([#4319](https://github.com/Azure/azure-sdk-for-rust/pull/4319)) + ### Breaking Changes +- Consolidated SDK fault-injection types as re-exports from `azure_data_cosmos_driver::fault_injection`. `FaultInjectionRule`, `FaultInjectionCondition`, `FaultInjectionResult`, `CustomResponse`, `FaultInjectionErrorType`, `FaultOperationType`, and the matching builders are now provided by the driver crate. Field access is via accessor methods (e.g., `rule.id()`, `condition.region()`, `response.body()`) rather than direct field reads. The SDK retains only `FaultInjectionClientBuilder` (gateway-side transport wrapper). ([#4319](https://github.com/Azure/azure-sdk-for-rust/pull/4319)) + ### Bugs Fixed ### Other Changes diff --git a/sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md b/sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md index 3a1550f16f5..8f2dbd121ad 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md +++ b/sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md @@ -4,6 +4,8 @@ ### Features Added +- Added Gateway 2.0 ("thin client") transport support, enabled by default. The new transport routes data-plane requests through a regional thin-client proxy that forwards RNTBD-over-HTTP/2 to the backend. Set `ConnectionPoolOptionsBuilder::with_gateway20_disabled(true)` to fall back to the direct gateway transport. Note that Gateway 2.0 is **not currently covered by latency SLAs** and may impose higher per-request latency. ([#4319](https://github.com/Azure/azure-sdk-for-rust/pull/4319)) + ### Breaking Changes ### Bugs Fixed From 6c89a1660ae1fbc012f649e7e56dd4be70ec04a6 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:41:51 -0700 Subject: [PATCH 43/48] Fix rustdoc errors in fault_injection module re-exports MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Build Analyze CI stage runs `cargo doc --no-deps --all-features` and was failing with two issues introduced by the fault-injection re-export refactor (commit d0109784b): 1. `error: redundant explicit link target` on the `TransportKind` re-export — the explicit `(azure_data_cosmos_driver::diagnostics::TransportKind)` target is redundant because rustdoc resolves `[`TransportKind`]` to the same destination via the in-scope `pub use`. 2. `warning: public documentation for `fault_injection` links to private item `crate::operation_context::OperationType`` (and the matching warning for `ResourceType`). Rustdoc's `private_intra_doc_links` lint flags these because the SDK's internal `OperationType` / `ResourceType` are crate-private; the module-level prose can refer to them by bare name without any intra-doc link target. Both fixes are docstring-only — no behavioral change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../azure_data_cosmos/src/fault_injection/mod.rs | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs index 85be04b9f9f..ede03f2b5bd 100644 --- a/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs +++ b/sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs @@ -8,9 +8,8 @@ //! [`azure_data_cosmos_driver::fault_injection`]. The SDK only owns the //! [`FaultInjectionClientBuilder`] (which produces an [`azure_core::http::Transport`] //! that the SDK pipeline plugs in) and a small adapter for translating SDK-side -//! [`OperationType`](crate::operation_context::OperationType) / -//! [`ResourceType`](crate::resource_context::ResourceType) pairs into the -//! driver's [`FaultOperationType`]. +//! `OperationType` / `ResourceType` pairs into the driver's +//! [`FaultOperationType`]. //! //! Below the transport layer, fault injection intercepts HTTP requests and //! triggers the same retry and failover behavior as a real service error. @@ -114,10 +113,9 @@ pub use azure_data_cosmos_driver::fault_injection::{ FaultInjectionRuleBuilder, FaultOperationType, }; -/// Re-export of the driver's [`TransportKind`](azure_data_cosmos_driver::diagnostics::TransportKind) -/// enum so SDK consumers can scope fault-injection rules to a specific -/// transport (Gateway 1.x vs Gateway 2.0) without depending on the -/// driver crate directly. +/// Re-export of the driver's [`TransportKind`] enum so SDK consumers can +/// scope fault-injection rules to a specific transport (Gateway 1.x vs +/// Gateway 2.0) without depending on the driver crate directly. pub use azure_data_cosmos_driver::diagnostics::TransportKind; use crate::operation_context::OperationType as SdkOperationType; From 2780650b31374b53d280b6ef03cfe62b5ea289d9 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 12:52:48 -0700 Subject: [PATCH 44/48] =?UTF-8?q?Rename=20thinclient=20=E2=86=92=20Gateway?= =?UTF-8?q?=202.0=20per=20spec=20naming=20policy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per GATEWAY_20_SPEC.md §35, 'Gateway 2.0' is the canonical name in all Rust code, docs, and comments. 'Thin client' is reserved only for (a) Java/.NET source symbol references and (b) literal wire-header strings such as x-ms-thinclient-proxy-*. Renamed (Rust): - AccountProperties accessor methods on the driver-internal cache: has_thin_client_endpoints → has_gateway20_endpoints thin_client_writable_regions → gateway20_writable_regions thin_client_readable_regions → gateway20_readable_regions - routing_systems::parse_thin_client_locations and its parameter / local-variable names → parse_gateway20_locations. - E2E test fn gateway20_query_streams_through_thin_client → gateway20_query_streams. - Test fixture URL host central.thinclient.azure.com → central.gateway20.azure.com in operation_pipeline.rs. Renamed (docs/comments/CI): - Doc comments and inline comments referencing 'thin client' / 'thin-client' across CHANGELOGs, connection_pool, constants, adaptive_transport, account_metadata_cache, diagnostics_context, operation_pipeline, routing_systems, fault-injection tests, gateway20 pipeline tests, GATEWAY_20_SPEC, TRANSPORT_PIPELINE_SPEC, cosmos_client_builder (public-facing API doc), gateway20_e2e module doc, and ci.yml comments. - ci.yml secret variable names $(thinclient-test-endpoint) / $(thinclient-test-key) → $(gateway20-test-endpoint) / $(gateway20-test-key), mirroring the AZURE_COSMOS_GW20_* env-var naming convention. NOTE: the corresponding KeyVault-backed entries in the azure-sdk-tests-cosmos service connection's variable group may need a matching rename by engsys. - Log messages 'non-HTTPS thin-client endpoint URL' / 'Duplicate thin-client region with conflicting URL' rewritten to use 'Gateway 2.0'. - Dropped public-facing parentheticals like 'Gateway 2.0 ("thin client")' from CHANGELOG and CosmosClientBuilder docs per the user request to stop explaining the legacy term. Kept (per policy exceptions): - All x-ms-thinclient-* / x-ms-cosmos-use-thinclient wire header strings and the inline-doc comment block in constants.rs that explains why those wire names are unchanged (server-defined). - All Java/.NET symbol references (ThinClientStoreModel, ThinClientStoreClient, ThinClientHttpMessageHandler, thinClientProxyExcludedSet, useThinClientStoreModel, …) in spec text. - AccountProperties::thin_client_*_locations field names — they mirror the wire JSON properties via #[serde(rename_all = "camelCase")] and renaming would require explicit #[serde(rename = "…")] decorations. Added explanatory note to the field doc comments. - GATEWAY20_USE_THINCLIENT constant identifier — wire-string-mirroring suffix. - GATEWAY_20_SPEC.md historical 'formerly "thin client"' note, cspell directive, and the naming policy itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/azure_data_cosmos/CHANGELOG.md | 2 +- .../src/clients/cosmos_client_builder.rs | 14 ++++----- .../tests/emulator_tests/gateway20_e2e.rs | 13 ++++----- .../azure_data_cosmos_driver/CHANGELOG.md | 2 +- .../docs/GATEWAY_20_SPEC.md | 12 ++++---- .../docs/TRANSPORT_PIPELINE_SPEC.md | 4 +-- .../azure_data_cosmos_driver/src/constants.rs | 2 +- .../src/diagnostics/diagnostics_context.rs | 4 +-- .../driver/cache/account_metadata_cache.rs | 26 +++++++++++------ .../src/driver/pipeline/operation_pipeline.rs | 4 +-- .../src/driver/routing/routing_systems.rs | 18 ++++++------ .../driver/transport/adaptive_transport.rs | 2 +- .../src/options/connection_pool.rs | 8 ++--- .../emulator_tests/driver_fault_injection.rs | 8 ++--- .../tests/gateway20_pipeline_tests.rs | 6 ++-- sdk/cosmos/ci.yml | 29 +++++++++---------- 16 files changed, 80 insertions(+), 74 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos/CHANGELOG.md b/sdk/cosmos/azure_data_cosmos/CHANGELOG.md index 68bab0acd9f..f7c9133a346 100644 --- a/sdk/cosmos/azure_data_cosmos/CHANGELOG.md +++ b/sdk/cosmos/azure_data_cosmos/CHANGELOG.md @@ -4,7 +4,7 @@ ### Features Added -- Added `CosmosClientBuilder::with_gateway20_disabled(bool)` to opt out of the new Gateway 2.0 ("thin client") transport, which is now enabled by default. Gateway 2.0 routes data-plane requests through a regional thin-client proxy that forwards RNTBD-over-HTTP/2 to the backend. Set this to `true` to fall back to the direct gateway transport — useful for workloads that depend on the published gateway latency SLAs (Gateway 2.0 is not currently covered by them) or that need the direct-gateway behavior for diagnostics. ([#4319](https://github.com/Azure/azure-sdk-for-rust/pull/4319)) +- Added `CosmosClientBuilder::with_gateway20_disabled(bool)` to opt out of the new Gateway 2.0 transport, which is now enabled by default. Gateway 2.0 routes data-plane requests through a regional proxy that forwards RNTBD-over-HTTP/2 to the backend. Set this to `true` to fall back to the direct gateway transport — useful for workloads that depend on the published gateway latency SLAs (Gateway 2.0 is not currently covered by them) or that need the direct-gateway behavior for diagnostics. ([#4319](https://github.com/Azure/azure-sdk-for-rust/pull/4319)) ### Breaking Changes diff --git a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs index 48fc04abbf1..ad35092b727 100644 --- a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs @@ -93,11 +93,11 @@ pub struct CosmosClientBuilder { fault_injection_builder: Option, /// Fallback endpoints tried when the primary endpoint is unavailable. backup_endpoints: Vec, - /// Operator override for the Gateway 2.0 ("thin client") transport. + /// Operator override for the Gateway 2.0 transport. /// /// `None` (the default) leaves the underlying driver in charge of /// routing — Gateway 2.0 is selected automatically whenever the - /// account advertises a thin-client endpoint and HTTP/2 is allowed. + /// account advertises a Gateway 2.0 endpoint and HTTP/2 is allowed. /// `Some(true)` forces every request through the standard gateway /// transport via [`with_gateway20_disabled`](Self::with_gateway20_disabled); /// `Some(false)` explicitly opts in (matching the default behaviour). @@ -177,16 +177,16 @@ impl CosmosClientBuilder { self } - /// Disables the Gateway 2.0 ("thin client") transport for this client. + /// Disables the Gateway 2.0 transport for this client. /// /// Gateway 2.0 is the next-generation Cosmos DB dataplane transport: - /// SDK connections terminate at a regional thin-client proxy that + /// SDK connections terminate at a regional Gateway 2.0 proxy that /// forwards RNTBD-over-HTTP/2 to the backend. **Gateway 2.0 is enabled - /// by default** — whenever the account advertises a thin-client endpoint + /// by default** — whenever the account advertises a Gateway 2.0 endpoint /// the SDK routes eligible dataplane operations through it and falls /// back to the standard gateway only for operations Gateway 2.0 cannot /// serve (e.g. metadata requests or accounts that do not advertise a - /// thin-client endpoint). + /// Gateway 2.0 endpoint). /// /// Pass `true` to opt out and force every request through the standard /// gateway transport. The standard gateway path remains supported and @@ -195,7 +195,7 @@ impl CosmosClientBuilder { /// /// # Latency caveat /// - /// Gateway 2.0 traffic flows through a thin-client proxy that is + /// Gateway 2.0 traffic flows through a proxy that is /// **not currently covered by the regional Cosmos DB latency SLA**. /// Workloads with strict P99 latency requirements should opt out via /// `with_gateway20_disabled(true)` until the proxy reaches general diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index 323b0fe66d1..7d4f399ff5c 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -4,8 +4,8 @@ //! End-to-end tests for the Gateway 2.0 transport, exercised through the //! `azure_data_cosmos` SDK surface (not the underlying driver crate). //! -//! These tests run against a pre-provisioned Gateway 2.0 ("thin client") -//! account. The endpoint and primary key are read from the +//! These tests run against a pre-provisioned Gateway 2.0 account. The +//! endpoint and primary key are read from the //! `AZURE_COSMOS_GW20_ENDPOINT` and `AZURE_COSMOS_GW20_KEY` environment //! variables and gated by the `gateway20` test category. They are skipped by //! default; the main Cosmos Rust pipeline (`sdk/cosmos/ci.yml`) injects those @@ -73,7 +73,7 @@ fn live_credentials() -> Option<(String, String)> { /// /// `gateway20_disabled = false` opts the client in to Gateway 2.0; passing /// `true` exercises the operator-override path that pins the client to the -/// standard gateway even when the account advertises a thin-client endpoint. +/// standard gateway even when the account advertises a Gateway 2.0 endpoint. async fn build_client( endpoint: &str, key: &str, @@ -179,7 +179,7 @@ pub async fn gateway20_point_crud_round_trip() -> Result<(), Box Result<(), Box Result<(), Box> -{ +pub async fn gateway20_query_streams() -> Result<(), Box> { let Some((endpoint, key)) = live_credentials() else { return Ok(()); }; @@ -443,7 +442,7 @@ pub async fn gateway20_diagnostics_validation() -> Result<(), Box Wire-header strings (`x-ms-thinclient-*`) are server-defined and unchanged; the Rust-side identifiers use the `GATEWAY20_*` prefix. @@ -433,7 +433,7 @@ EDIT sdk/cosmos/azure_data_cosmos/src/... — Replace SDK-side get_hashe - **Verify** account metadata cache parses `thinClientReadableLocations` / `thinClientWritableLocations` into `CosmosEndpoint::gateway20_url` - **Confirm** `build_account_endpoint_state()` constructs `CosmosEndpoint::regional_with_gateway20()` correctly in multi-region accounts (existing tests at `routing_systems.rs:218–289` already cover this) -- **Verify** `AccountProperties::has_thin_client_endpoints()` is used as the gating signal per §3.1 +- **Verify** `AccountProperties::has_gateway20_endpoints()` is used as the gating signal per §3.1 - **Add** `x-ms-cosmos-use-thinclient` request header on account metadata fetches (new code) - **Test** endpoint discovery with live account that has gateway 2.0 enabled (handled by Phase 6 live pipeline) @@ -455,7 +455,7 @@ Account metadata response includes: #### Files Changed ``` -EDIT src/driver/cache/account_metadata_cache.rs — Verify thin client endpoint parsing (audit only) +EDIT src/driver/cache/account_metadata_cache.rs — Verify Gateway 2.0 endpoint parsing (audit only) EDIT src/driver/transport/cosmos_headers.rs — Add x-ms-cosmos-use-thinclient header (NEW) TEST src/driver/routing/routing_systems.rs — Add tests for read/write pairing edge cases ``` diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md index ec5782df19d..29771225e6d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/TRANSPORT_PIPELINE_SPEC.md @@ -2246,7 +2246,7 @@ endpoints are detected and used. No sharding yet — stream limit may be hit und > (`thinClient*Locations`) to determine the transport strategy, rather than runtime ALPN > negotiation against the gateway. This is sufficient because: > (1) reqwest with `http2` feature already performs ALPN automatically for `Http2Preferred`, -> (2) Gateway 2.0 is definitively identified by the presence of thin-client locations in +> (2) Gateway 2.0 is definitively identified by the presence of Gateway 2.0 locations in > account metadata, and (3) `http2_prior_knowledge()` for `Http2Only` skips ALPN entirely > (h2 is guaranteed). Runtime probing may be revisited if a use case arises where the > configuration-based approach is insufficient. @@ -2317,7 +2317,7 @@ Cut over all remaining operations and remove the old pipeline code. | 10.7 | Move fault injection tests to driver-level APIs | `tests/` | | 10.8 | Full integration test pass | `tests/` | -**What works after Step 10**: `azure_data_cosmos` is a thin client layer that builds +**What works after Step 10**: `azure_data_cosmos` is a thin SDK wrapper layer that builds `CosmosOperation` values and delegates all execution to the driver. Duplicate pipeline, retry, and routing code is removed. diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs b/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs index e4804d4df75..de5f39c5ad5 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/constants.rs @@ -43,7 +43,7 @@ pub const GATEWAY20_RANGE_MAX: HeaderName = HeaderName::from_static("x-ms-thincl /// Account-metadata fetch hint. /// -/// Instructs the response to advertise thin-client endpoints. +/// Instructs the response to advertise Gateway 2.0 endpoints. pub const GATEWAY20_USE_THINCLIENT: HeaderName = HeaderName::from_static("x-ms-cosmos-use-thinclient"); diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/diagnostics/diagnostics_context.rs b/sdk/cosmos/azure_data_cosmos_driver/src/diagnostics/diagnostics_context.rs index a1534b7409a..043648ee0e7 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/diagnostics/diagnostics_context.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/diagnostics/diagnostics_context.rs @@ -159,7 +159,7 @@ pub enum TransportSecurity { /// The concrete transport kind used for a request. /// -/// This distinguishes the standard gateway path from Gateway 2.0 thin-client +/// This distinguishes the standard gateway path from Gateway 2.0 /// routing while keeping TLS/emulator concerns in [`TransportSecurity`]. #[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Hash, Serialize)] #[serde(rename_all = "snake_case")] @@ -169,7 +169,7 @@ pub enum TransportKind { #[default] Gateway, - /// Gateway 2.0 thin-client transport. + /// Gateway 2.0 transport. Gateway20, } diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/account_metadata_cache.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/account_metadata_cache.rs index dcd0a412fb8..729f6cdb2aa 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/account_metadata_cache.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/account_metadata_cache.rs @@ -145,15 +145,23 @@ pub(crate) struct AccountProperties { /// Raw JSON string containing query engine feature/configuration flags. pub query_engine_configuration: String, - /// Regional Gateway 2.0 endpoints accepting writes (thin client mode). + /// Regional Gateway 2.0 endpoints accepting writes. /// When present, indicates that Gateway 2.0 should be used for the /// dataplane transport instead of the standard gateway endpoint. + /// + /// The Rust field name retains the `thin_client_*` prefix because the + /// struct uses `#[serde(rename_all = "camelCase")]` to deserialize from + /// the wire-defined property `thinClientWritableLocations`. Renaming the + /// field would break the serde mapping. #[serde(default)] pub thin_client_writable_locations: Vec, - /// Regional Gateway 2.0 endpoints for reads (thin client mode). + /// Regional Gateway 2.0 endpoints for reads. /// When present, indicates that Gateway 2.0 should be used for the /// dataplane transport instead of the standard gateway endpoint. + /// + /// See note on `thin_client_writable_locations` for why the field name + /// retains the `thin_client_*` prefix. #[serde(default)] pub thin_client_readable_locations: Vec, @@ -184,25 +192,25 @@ impl AccountProperties { .collect() } - /// Returns `true` if Gateway 2.0 (thin client) endpoints are available. + /// Returns `true` if Gateway 2.0 endpoints are available. /// - /// When thin client locations are present in the account properties, + /// When Gateway 2.0 locations are present in the account properties, /// the driver should use Gateway 2.0 for the dataplane transport. - pub(crate) fn has_thin_client_endpoints(&self) -> bool { + pub(crate) fn has_gateway20_endpoints(&self) -> bool { !self.thin_client_writable_locations.is_empty() || !self.thin_client_readable_locations.is_empty() } - /// Returns thin client (Gateway 2.0) writable locations, if any. - pub(crate) fn thin_client_writable_regions(&self) -> Vec { + /// Returns Gateway 2.0 writable locations, if any. + pub(crate) fn gateway20_writable_regions(&self) -> Vec { self.thin_client_writable_locations .iter() .map(|loc| loc.name.clone()) .collect() } - /// Returns thin client (Gateway 2.0) readable locations, if any. - pub(crate) fn thin_client_readable_regions(&self) -> Vec { + /// Returns Gateway 2.0 readable locations, if any. + pub(crate) fn gateway20_readable_regions(&self) -> Vec { self.thin_client_readable_locations .iter() .map(|loc| loc.name.clone()) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs index 850d13265e1..e66a583eb8d 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs @@ -1348,7 +1348,7 @@ mod tests { PartitionKey::from("pk1"), "doc1", )); - let gateway20_url = Url::parse("https://central.thinclient.azure.com:444/").unwrap(); + let gateway20_url = Url::parse("https://central.gateway20.azure.com:444/").unwrap(); let endpoint = CosmosEndpoint::regional_with_gateway20( "centralus".into(), Url::parse("https://central.documents.azure.com:443/").unwrap(), @@ -1384,7 +1384,7 @@ mod tests { assert_eq!(routing.transport_mode, TransportMode::Gateway20); assert_eq!( routing.selected_url.host_str(), - Some("central.thinclient.azure.com") + Some("central.gateway20.azure.com") ); assert_eq!( routing.endpoint_key, diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs index 8ab9d704bcb..0abb17ba0d9 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs @@ -69,11 +69,11 @@ pub(crate) fn build_account_endpoint_state( fn build_preferred_endpoints( standard_locations: &[crate::driver::cache::AccountRegion], - thin_client_locations: &[crate::driver::cache::AccountRegion], + gateway20_locations: &[crate::driver::cache::AccountRegion], gateway20_enabled: bool, ) -> Vec { - let thin_client_urls = if gateway20_enabled { - parse_thin_client_locations(thin_client_locations) + let gateway20_urls = if gateway20_enabled { + parse_gateway20_locations(gateway20_locations) } else { HashMap::new() }; @@ -82,7 +82,7 @@ fn build_preferred_endpoints( for region in standard_locations { let url = region.database_account_endpoint.url().clone(); - let endpoint = thin_client_urls + let endpoint = gateway20_urls .get(®ion.name) .cloned() .map(|gateway20_url| { @@ -100,12 +100,12 @@ fn build_preferred_endpoints( endpoints } -fn parse_thin_client_locations( - thin_client_locations: &[crate::driver::cache::AccountRegion], +fn parse_gateway20_locations( + gateway20_locations: &[crate::driver::cache::AccountRegion], ) -> HashMap { let mut urls = HashMap::new(); - for region in thin_client_locations { + for region in gateway20_locations { let url = region.database_account_endpoint.url().clone(); if url.scheme() != "https" { @@ -113,7 +113,7 @@ fn parse_thin_client_locations( region = %region.name, endpoint = %region.database_account_endpoint, scheme = url.scheme(), - "Ignoring non-HTTPS thin-client endpoint URL" + "Ignoring non-HTTPS Gateway 2.0 endpoint URL" ); continue; } @@ -125,7 +125,7 @@ fn parse_thin_client_locations( region = %region.name, existing_url = %existing, new_url = %url, - "Duplicate thin-client region with conflicting URL; keeping first entry" + "Duplicate Gateway 2.0 region with conflicting URL; keeping first entry" ); } }) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/adaptive_transport.rs b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/adaptive_transport.rs index 567f88c7f14..07a32bd4bac 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/adaptive_transport.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/adaptive_transport.rs @@ -19,7 +19,7 @@ use crate::options::ConnectionPoolOptions; /// `Gateway` is an unsharded HTTP/1.1 transport used when the gateway does not /// support HTTP/2. `ShardedGateway` is a per-endpoint sharded HTTP/2 transport /// used when HTTP/2 has been confirmed via the initialization probe. -/// `ShardedGateway20` is reserved for Gateway 2.0 thin-client requests and +/// `ShardedGateway20` is reserved for Gateway 2.0 requests and /// always uses HTTP/2 prior knowledge. #[derive(Clone)] pub(crate) enum AdaptiveTransport { diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs index 15a1958fd4d..74e11a455ee 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs @@ -209,7 +209,7 @@ impl ConnectionPoolOptions { /// Returns whether Gateway 2.0 is disabled for this pool. /// /// Gateway 2.0 is enabled by default whenever the account advertises a - /// thin-client endpoint and HTTP/2 is allowed. When this method returns + /// Gateway 2.0 endpoint and HTTP/2 is allowed. When this method returns /// `true` the driver routes every request through the standard gateway /// transport, regardless of the account advertisement. /// @@ -501,7 +501,7 @@ impl ConnectionPoolOptionsBuilder { /// Disables Gateway 2.0 for this pool. /// /// Gateway 2.0 is enabled by default whenever the account advertises a - /// thin-client endpoint and HTTP/2 is allowed. Pass `true` to force every + /// Gateway 2.0 endpoint and HTTP/2 is allowed. Pass `true` to force every /// request through the standard gateway transport regardless of the /// account advertisement (operator override). /// @@ -511,7 +511,7 @@ impl ConnectionPoolOptionsBuilder { /// /// # Latency caveat /// - /// Gateway 2.0 traffic flows through a thin-client proxy that is **not + /// Gateway 2.0 traffic flows through a Gateway 2.0 proxy that is **not /// currently covered by the regional Cosmos DB latency SLA**. Workloads /// with strict P99 latency requirements should call this method with /// `true` until the proxy reaches general availability. @@ -557,7 +557,7 @@ impl ConnectionPoolOptionsBuilder { )?; // Gateway 2.0 is enabled by default whenever HTTP/2 is allowed and - // the account advertises a thin-client endpoint. The flag uses a + // the account advertises a Gateway 2.0 endpoint. The flag uses a // negative-term name so that the absence of an opt-in is the on // state; operators disable Gateway 2.0 by setting this to `true`. // There is intentionally no `AZURE_COSMOS_*` env var that toggles diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs index 4f5bafba4a2..e1c2770962a 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_fault_injection.rs @@ -338,7 +338,7 @@ pub async fn fault_injection_connection_error() -> Result<(), Box> { // ---------------------------------------------------------------------------- // // The following three tests lock in the retry/failover behavior the Gateway -// 2.0 transport must exhibit when the underlying thin-client connection fails. +// 2.0 transport must exhibit when the underlying Gateway 2.0 connection fails. // Each test exercises a distinct failure shape: // // - 503 Service Unavailable → regional failover @@ -349,7 +349,7 @@ pub async fn fault_injection_connection_error() -> Result<(), Box> { // kind filter — there is no `with_transport_kind(TransportKind::Gateway20)` // today. As a result, faults injected here apply to whichever transport happens // to be selected at dispatch time. To reliably exercise these against Gateway -// 2.0, the Phase 6 CI matrix must run them on a live thin-client account +// 2.0, the Phase 6 CI matrix must run them on a live Gateway 2.0 account // (`testCategory = 'gateway20'`); the emulator does not yet expose Gateway // 2.0 endpoints. See `docs/GATEWAY_20_SPEC.md` (Phase 6) for the harness gap. @@ -358,7 +358,7 @@ pub async fn fault_injection_connection_error() -> Result<(), Box> { /// The rule is scoped to [`TransportKind::Gateway20`] so it does not also /// fire on standard-gateway requests issued during account discovery. The /// emulator does not yet expose Gateway 2.0 endpoints, so this test is -/// gated behind the `gateway20` test category until CI gains a thin-client +/// gated behind the `gateway20` test category until CI gains a Gateway 2.0 /// account; see `docs/GATEWAY_20_SPEC.md` (Phase 6). #[tokio::test] #[cfg_attr( @@ -478,7 +478,7 @@ pub async fn gateway20_request_timeout_cross_region_for_reads() -> Result<(), Bo /// The rule is scoped to [`TransportKind::Gateway20`] so it does not also /// fire on standard-gateway requests. The emulator does not yet expose /// Gateway 2.0 endpoints, so this test is gated behind the `gateway20` -/// test category until CI gains a thin-client account. +/// test category until CI gains a Gateway 2.0 account. #[tokio::test] #[cfg_attr( not(test_category = "gateway20"), diff --git a/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs index f67b1d038e5..0d5c66f15f0 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs @@ -11,7 +11,7 @@ //! ## Categories //! //! 1. **Operator override** — the operator can opt out of Gateway 2.0 even when -//! the account advertises a thin-client endpoint. Verified via the public +//! the account advertises a Gateway 2.0 endpoint. Verified via the public //! [`ConnectionPoolOptions::with_gateway20_disabled`] toggle. //! //! 2. **Operation eligibility** — operations that Gateway 2.0 does not yet @@ -188,7 +188,7 @@ async fn probe(runtime: &Arc) { /// Verifies that the operator override flag (`with_gateway20_disabled(true)`) /// is honored end-to-end at the connection-pool level. When the flag is set, /// the runtime must not select the Gateway 2.0 transport even if account -/// metadata advertises a thin-client endpoint. +/// metadata advertises a Gateway 2.0 endpoint. /// /// We assert the contract structurally via `ConnectionPoolOptions`: when the /// flag is `true`, `gateway20_disabled()` reports `true`, and the @@ -288,7 +288,7 @@ async fn stored_proc_execute_falls_back_to_standard_gateway() { /// Once Gateway 2.0 has dispatched a request, the recorded /// `RequestDiagnostics` for that request must indicate `TransportKind::Gateway20`. /// -/// This contract requires a live thin-client account. The inside-crate test +/// This contract requires a live Gateway 2.0 account. The inside-crate test /// `transport_pipeline::tests::gateway20_pipeline_records_transport_kind` /// already covers the wiring at the unit-test level; this standalone test is /// the live-account companion. diff --git a/sdk/cosmos/ci.yml b/sdk/cosmos/ci.yml index 0763902ed97..4fa9d99cbbe 100644 --- a/sdk/cosmos/ci.yml +++ b/sdk/cosmos/ci.yml @@ -43,21 +43,21 @@ extends: CloudConfig: Public: ServiceConnection: azure-sdk-tests-cosmos - # Endpoint + master key for the pre-provisioned Gateway 2.0 ("thin client") - # account, surfaced from the `azure-sdk-tests-cosmos` service connection's + # Endpoint + master key for the pre-provisioned Gateway 2.0 account, + # surfaced from the `azure-sdk-tests-cosmos` service connection's # secret variable group. Both `LiveTestMatrixConfigs` entries below see # these env vars at job time; the standard Cosmos live tests ignore them # while the Gateway 2.0 matrix entry consumes them via the `gateway20` # test-category scaffolding (see # `azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs`). # - # This mirrors the Java Cosmos SDK's thin-client live-test setup - # (`sdk/cosmos/tests.yml` in `Azure/azure-sdk-for-java`), which adds a - # second matrix entry pointing at a pre-provisioned thin-client account - # rather than spinning up a dedicated pipeline. + # This mirrors the Java Cosmos SDK's Gateway 2.0 live-test + # setup (`sdk/cosmos/tests.yml` in `Azure/azure-sdk-for-java`), which + # adds a second matrix entry pointing at a pre-provisioned Gateway 2.0 + # account rather than spinning up a dedicated pipeline. EnvVars: - AZURE_COSMOS_GW20_ENDPOINT: $(thinclient-test-endpoint) - AZURE_COSMOS_GW20_KEY: $(thinclient-test-key) + AZURE_COSMOS_GW20_ENDPOINT: $(gateway20-test-endpoint) + AZURE_COSMOS_GW20_KEY: $(gateway20-test-key) MatrixConfigs: - Name: Cosmos_release Path: sdk/cosmos/release-platform-matrix.json @@ -69,13 +69,12 @@ extends: Path: sdk/cosmos/live-platform-matrix.json Selection: sparse GenerateVMJobs: true - # Gateway 2.0 ("thin client") live tests run against a pre-provisioned - # account that is NOT created per-pipeline-run. The - # `ArmTemplateParameters` block in `live-gateway20-matrix.json` is kept - # so the deploy step still fires (the matrix machinery requires it), - # but the provisioned account is unused — tests connect to the dedicated - # thin-client account via the `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env - # vars wired above. + # Gateway 2.0 live tests run against a pre-provisioned account that is + # NOT created per-pipeline-run. The `ArmTemplateParameters` block in + # `live-gateway20-matrix.json` is kept so the deploy step still fires + # (the matrix machinery requires it), but the provisioned account is + # unused — tests connect to the dedicated Gateway 2.0 account via the + # `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env vars wired above. - Name: Cosmos_gateway20_live_test Path: sdk/cosmos/live-gateway20-matrix.json Selection: sparse From ed1570576b37f035fd9fa60967b8a1091165ec56 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Thu, 30 Apr 2026 16:12:33 -0700 Subject: [PATCH 45/48] Move Gateway 2.0 cspell:ignore words to sdk/cosmos/.cspell.json Per the cspell convention (.github/skills/check-spelling/SKILL.md), service-level word ignores belong in sdk/{service}/.cspell.json rather than as file-level cspell:ignore directives at the top of individual files. Changes: - sdk/cosmos/.cspell.json: add 7 entries to the ignoreWords array, preserving the existing alphabetical layout: - cooldown - directconnectivity - Mgmt - myacct - pushdown - thinclient - unparseable ALPN and cutover were already in the list. THINCLIENT (uppercase) is covered by the lowercase entry since cspell is case-insensitive by default. - sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md: remove the inline file-level cspell:ignore directive from the top of the spec; the words now live in the per-service config above. - sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs: fix two British 'behaviour' spellings to 'behavior' to match the rest of the repo (the words were introduced by recent Gateway 2.0 doc rewrites). - sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs: fix one 'behavioural' to 'behavioral' for the same reason. cspell now reports zero issues across the 25 cosmos files modified on this branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/.cspell.json | 7 +++++++ .../azure_data_cosmos/src/clients/cosmos_client_builder.rs | 4 ++-- .../tests/emulator_tests/gateway20_e2e.rs | 2 +- .../azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md | 1 - 4 files changed, 10 insertions(+), 4 deletions(-) diff --git a/sdk/cosmos/.cspell.json b/sdk/cosmos/.cspell.json index 117ff7d1c0e..9049b148ecd 100644 --- a/sdk/cosmos/.cspell.json +++ b/sdk/cosmos/.cspell.json @@ -26,12 +26,14 @@ "chinanorth", "cloneable", "colls", + "cooldown", "cosmosclient", "cutover", "Daad", "dedicatedgateway", "derefs", "dhat", + "directconnectivity", "Dmaster", "documentdb", "dotproduct", @@ -73,10 +75,12 @@ "LLZA", "memcheck", "MEMORYSTATUSEX", + "Mgmt", "moka", "multihash", "murmurhash", "myaccount", + "myacct", "mycoll", "mycollection", "mycontainer", @@ -110,6 +114,7 @@ "PPAF", "PPCB", "pushback", + "pushdown", "pyroscope", "qname", "qself", @@ -144,6 +149,7 @@ "testcontainer", "testdata", "testdb", + "thinclient", "uaecentral", "uaenorth", "udfs", @@ -151,6 +157,7 @@ "uksouth", "ukwest", "uncontended", + "unparseable", "unsharded", "upsert", "upserted", diff --git a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs index ad35092b727..3f7f84c4d5d 100644 --- a/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs +++ b/sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs @@ -100,7 +100,7 @@ pub struct CosmosClientBuilder { /// account advertises a Gateway 2.0 endpoint and HTTP/2 is allowed. /// `Some(true)` forces every request through the standard gateway /// transport via [`with_gateway20_disabled`](Self::with_gateway20_disabled); - /// `Some(false)` explicitly opts in (matching the default behaviour). + /// `Some(false)` explicitly opts in (matching the default behavior). gateway20_disabled: Option, } @@ -206,7 +206,7 @@ impl CosmosClientBuilder { /// /// * `disabled` - `true` to suppress Gateway 2.0 and force the standard /// gateway transport; `false` (or leaving the builder untouched) keeps - /// the default Gateway 2.0 behaviour. + /// the default Gateway 2.0 behavior. pub fn with_gateway20_disabled(mut self, disabled: bool) -> Self { self.gateway20_disabled = Some(disabled); self diff --git a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs index 7d4f399ff5c..f154d9b0227 100644 --- a/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs +++ b/sdk/cosmos/azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs @@ -395,7 +395,7 @@ pub async fn gateway20_change_feed_latest_version() { /// TODO: extend this test to assert `TransportKind::Gateway20` once /// `CosmosDiagnostics` surfaces the driver transport kind. Today the SDK /// `CosmosDiagnostics` only carries `activity_id` and `server_duration_ms`, -/// so the strongest behavioural assertion we can make is that those fields +/// so the strongest behavioral assertion we can make is that those fields /// are populated when the request was routed through the Gateway 2.0 /// pipeline. #[tokio::test] diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 149285dfa8c..7c0db63f7fb 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -1,4 +1,3 @@ - # Gateway 2.0 Design Spec for Rust Driver & SDK **Status**: Draft / Iterating From 45595560db546a5832d5778a6ce32600fa4dbc98 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 11 May 2026 14:07:27 -0700 Subject: [PATCH 46/48] =?UTF-8?q?Spec=20G2->G1=20connectivity-failure=20ci?= =?UTF-8?q?rcuit=20breaker=20(=C2=A74.3)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add §4.3 to GATEWAY_20_SPEC.md covering the firewall-fallback question: should the Rust SDK auto-degrade from Gateway 2.0 to Gateway V1 when a client is behind a firewall that blocks the G2 endpoints? Decision: Option B (client-scoped circuit breaker), not fail-fast. Java/.NET have neither today; Rust adopts the breaker so 'default-on' G2 does not break customers behind restrictive egress. Spec covers (with rubber-duck-resolved blockers): - Trip condition uses a structured TransportFailureClass enum classified at error-construction time in driver/transport — not a late is_connectivity_failure() bool over an opaque error. Variants split into 'counts toward trip' (DnsFailure, ConnectRefused, ConnectTimeout, TlsHandshakeFailure, Http2NegotiationFailure, PreResponseReset) and 'does not count' (HttpResponse, PostSendTimeout, MidStreamFailure, AuthFailure, DecodeError, OperationCancelled). Edge cases for setup-phase timeouts and pre/post-headers reset spelled out. - Trip threshold differentiated for multi-endpoint accounts (N=2 distinct endpoints in window W=60s) vs single-endpoint accounts (2 failures on the sole endpoint in W) — avoids the hair-trigger N=1 default for one-region accounts. - Operation-scope contract: breaker is read at op start (per §3), current op completes on its chosen transport, only subsequent ops see the open breaker. - Concurrency model: atomic Closed->Open / Open->HalfOpen / probe reservation; exactly-one semantics for trips, probes, and warn-log. - Endpoint-cache contamination subsection: documents pre-existing bug in driver/routing/routing_systems.rs where mark_endpoint_unavailable() keys by endpoint.url() and contaminates G1 lookups when G2 fails. Spec mandates either keying by (url, TransportMode) or maintaining a separate g2 unavailable-endpoints map. - Recovery: HalfOpen probe consumed only by G2-eligible op; idle clients accepted to stay on G1 until next op (intentional tradeoff). - Diagnostics: explicit Gateway20DecisionReason enum (Eligible, OperatorDisabled, BreakerOpen, NoGateway20Endpoints, OperationIneligible) plus BreakerSnapshot per attempt. Test plan: 17 unit tests (state machine + concurrency + classifier coverage + topology refresh + mid-op behavior), 9 fault-injection tests (F7 specifically validates G2/G1 endpoint-cache isolation; F8 covers topology refresh mid-cooldown; F9 covers regional-failover precedence), 2 live tests. Manual 'firewall in CI' test deliberately deferred — Java has the same gap. Bookkeeping: - Add 'firewalled' and 'GOAWAY' to sdk/cosmos/.cspell.json (HTTP/2 protocol term + new spec vocabulary). - Add Q4 to Open Questions section linking back to §4.3. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- sdk/cosmos/.cspell.json | 2 + .../docs/GATEWAY_20_SPEC.md | 219 ++++++++++++++++++ 2 files changed, 221 insertions(+) diff --git a/sdk/cosmos/.cspell.json b/sdk/cosmos/.cspell.json index 9049b148ecd..85e0888a9a6 100644 --- a/sdk/cosmos/.cspell.json +++ b/sdk/cosmos/.cspell.json @@ -50,6 +50,7 @@ "failback", "failovers", "FILETIME", + "firewalled", "flamegraph", "fract", "francecentral", @@ -59,6 +60,7 @@ "germanynorth", "germanynortheast", "germanywestcentral", + "GOAWAY", "hostnames", "hotfixes", "idents", diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index 7c0db63f7fb..c6fd0ad8edb 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -12,6 +12,9 @@ 2. [Motivation](#2-motivation) 3. [Gating, Configuration & Override](#3-gating-configuration--override) 4. [Retry Behavior](#4-retry-behavior) + - 4.1 [HTTP 449 (Retry-With)](#41-http-449-retry-with--dedicated-policy-separate-from-410gone) + - 4.2 [HTTP 404/1002 (READ_SESSION_NOT_AVAILABLE)](#42-http-404-not-found-with-sub-status-1002-read_session_not_available) + - 4.3 [Connectivity-failure circuit breaker (G2 → G1 fallback)](#43-connectivity-failure-circuit-breaker-g2--g1-fallback) 5. [Rust Implementation Plan](#5-rust-implementation-plan) 6. [Open Questions](#6-open-questions) @@ -149,6 +152,221 @@ These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy ope Beyond 449 and 404/1002, Gateway 2.0 follows the timeout/408 handling defined in `TRANSPORT_PIPELINE_SPEC.md` — no Gateway-2.0-specific override is introduced. +### 4.3 Connectivity-failure circuit breaker (G2 → G1 fallback) + +> **Status**: Design proposal — not yet implemented. Implementation is tracked as a follow-up to the Gateway 2.0 enablement work; this section reserves the design surface and the test plan so the implementation lands behaviorally consistent with the rest of the routing rules. + +#### 4.3.1 Problem + +Gateway 2.0 endpoints are advertised by the account on a different host (and may be advertised on a non-443 port — the test fixture in `routing_systems.rs` uses `:444`). Some enterprise networks block outbound TCP to non-443 ports, or block the Gateway 2.0 hostnames specifically. In those environments **every** Gateway 2.0 attempt from the client will fail at the transport layer (TCP refused, TCP timeout, TLS handshake failure, or HTTP/2 negotiation failure) while Gateway V1 on the *standard* gateway host:443 still works. + +Today the `endpoint_is_available()` check in `operation_pipeline.rs` skips a *single* G2 endpoint after we mark it `UnavailableReason::TransportError`, but the retry then immediately picks the *next* G2 endpoint. In a blanket-firewall scenario all G2 endpoints have the same outcome, so the operation exhausts its retry budget on G2 and fails — even though G1 would have succeeded immediately. + +Java and .NET both choose **fail-fast** here: `ThinClientStoreModel` extends `RxGatewayStoreModel` and inherits the standard regional retry stack, with **no automatic G2 → G1 fallback**. (Confirmed by inspection of `Azure/azure-sdk-for-java/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java` and the equivalent paths in `Azure/azure-cosmos-dotnet-v3`.) The Rust SDK is in a position to do better because it controls the per-attempt transport selection in `resolve_endpoint`, but doing so requires explicit policy and explicit telemetry so customers and SREs can reason about why their client is on G1. + +#### 4.3.2 Design space + +**Option A — Fail-fast (parity with Java/.NET).** +Keep current behavior. A single attempt against an unreachable G2 endpoint fails, the endpoint is marked `TransportError`, the retry tries the next region's G2 endpoint, and the operation eventually fails with a transport error if all G2 regions are unreachable. The operator must then explicitly set `gateway20_disabled = true` to route through G1. + +Pros: simple, matches Java/.NET semantics, no new state. Cons: customers in firewalled networks see a hard failure on every operation until they manually opt out, which breaks the "default on" rollout story for Gateway 2.0. + +**Option B — Connectivity-failure circuit breaker (recommended).** +Add a **client-scoped circuit breaker** that observes transport-layer failures across G2 endpoints and, after a trip threshold is met, suppresses Gateway 2.0 routing for subsequent operations on that client. This is functionally equivalent to flipping `gateway20_suppressed = true` at runtime, and deliberately *not* mutating the operator-controlled `options.gateway20_disabled` setting (which stays customer-owned). + +The trip is **client-scoped** (not account-scoped, not global) so that two `CosmosClient` instances pointing at different accounts — or even the same account from a different network namespace — do not contaminate each other. + +#### 4.3.3 Recommended design (Option B) + +##### Trip condition + +The breaker MUST trip only on signals that strongly correlate with a network/firewall block, not on signals that an outage at one G2 endpoint is producing. The classifier produces a closed enum (`TransportFailureClass`) that the breaker reads; downstream code MUST NOT pattern-match on `io::ErrorKind`, error strings, or `azure_core::Error` shapes: + +```text +enum TransportFailureClass { + // counted as connectivity failures + DnsFailure, + ConnectRefused, + ConnectTimeout, + TlsHandshakeFailure, + Http2NegotiationFailure, + PreResponseReset, // RST/EOF before any HTTP/2 response headers + // received, only when the transport layer can + // structurally prove this; otherwise classify + // as `MidStreamFailure` + + // explicitly NOT counted + HttpResponse, // the proxy returned any HTTP response (200..599) + PostSendTimeout, // timeout fired after a request was on the wire + MidStreamFailure, // GOAWAY / stream reset / RST after at least one + // response frame was observed + AuthFailure, + DecodeError, // malformed RNTBD body + OperationCancelled, // user-side cancellation +} +``` + +The classifier MUST live where the error is constructed — `cosmos_transport_client.rs` and `sharded_transport.rs` — so it produces a structured value at the source. Adding a late `is_connectivity_failure(&self) -> bool` accessor over an opaque error is **not** acceptable; it forces string-matching on reqwest/hyper error chains and produces non-deterministic trip/no-trip behavior. + +Notes on edge cases the rubber-duck pass surfaced: + +- **Timeout that fires during connect / TLS / HTTP/2 setup** counts as `ConnectTimeout` / `TlsHandshakeFailure` / `Http2NegotiationFailure` respectively (not `PostSendTimeout`). The transport layer MUST distinguish these by tracking which phase of the connection lifecycle it is in. +- **`Connection reset by peer` before any response headers** counts as `PreResponseReset` ONLY when the transport can prove (e.g. via a per-connection state flag) that no response frame was observed; otherwise classify as `MidStreamFailure`. When in doubt, do NOT count. +- **Mid-stream HTTP/2 reset / GOAWAY / stream reset after at least one response frame** is `MidStreamFailure` and does NOT count. The path is open by definition; the proxy is unhealthy. +- **HTTP responses of any status** are `HttpResponse` and do NOT count. A G2 proxy returning 401/403/500/503/etc. proves the network is open. + +##### Trip threshold + +Two separate thresholds — one for multi-endpoint accounts, one for single-endpoint accounts — both expressed against a sliding window of `W` wall-clock seconds: + +- **Multi-endpoint accounts** (`thinClientReadableLocations.len() >= 2`): trip when `N` distinct G2 endpoints (distinct host:port pairs) each raise `>= 1` connectivity failure within `W`. Recommended initial `N = 2`, `W = 60s`. The threshold MUST NOT scale up for accounts with more endpoints — `N = 2` already proves network-scoped trouble; larger `N` only delays escape from the firewall scenario. +- **Single-endpoint accounts** (`thinClientReadableLocations.len() == 1`): trip when the sole endpoint raises `>= 2` connectivity failures within `W`. Two failures on the same endpoint are required to defend against false trips from a single transient timeout. Recommended `W = 60s` (same window). + +The implementation MUST expose `N`, `W`, and `C` as `pub(crate) const` so they can be tuned in one place. Tuning during the live-test bake-in is tracked under §4.3.5 Q4b. + +##### Trip action and operation-scope contract + +The breaker affects **subsequent logical operations**, not the current in-flight operation. Per §3, transport mode is decided once per logical operation in `resolve_endpoint`, attached to `OperationContext`, and inherited by all retries of that operation. The breaker preserves this contract by being read **only at operation start** (when `RoutingDecision` is computed for the first attempt). Mid-operation retries do NOT re-read breaker state. + +This means: an operation that is already executing on G2 may observe `N` connectivity failures across its own retries and trip the breaker, but it WILL continue retrying on G2 until its own retry budget is exhausted. Only the *next* logical operation (and all later ones) sees the open breaker and routes via G1. + +When the threshold is hit, the breaker: + +1. Atomically transitions a per-client `breaker_state: Arc` from `Closed` to `Open { tripped_at: Instant, opens_again_at: Instant }`. Concurrent failures that all observe `Closed` MUST result in exactly one `Closed → Open` transition (use compare-and-swap or an internal mutex; document which in the implementation). +2. Causes `resolve_endpoint` to compute `use_gateway20 = false` at operation start when `breaker_state.is_open()`. Implementation: `RoutingDecision::transport_mode = TransportMode::Gateway` is forced, and the suppression reason is recorded in diagnostics (see "Diagnostics requirements"). +3. Emits exactly **one** `WARN`-level log line at trip time with: client id, observed endpoints, the underlying `TransportFailureClass` of the most-recent failure per endpoint, and `opens_again_at`. Subsequent operations during the open window MUST NOT emit per-operation warnings (otherwise a busy client floods the log). + +The breaker MUST NOT mutate `options.gateway20_disabled`. The setting remains the only operator-visible control; the breaker is an internal runtime override that decays. + +##### Concurrency model + +`AtomicBreakerState` MUST be safe for concurrent reads from many operation pipelines and concurrent failure observations from many transport callbacks. The implementation MUST guarantee: + +- Exactly one `Closed → Open` transition per trip event (concurrent failures past the threshold do not produce duplicate warn-log lines or duplicate `tripped_at` timestamps). +- Exactly one `Open → HalfOpen` transition per cooldown elapse (concurrent operations arriving after `opens_again_at` do not all become probes; only the first one wins the probe slot via CAS). +- Exactly one `HalfOpen → Closed` or `HalfOpen → Open` transition per probe outcome (the probe's terminal status drives a single state mutation; in-flight operations that observed `HalfOpen` between probe-start and probe-end route via `Gateway` and do NOT race the probe's resolution). + +Recommended primitive: `Arc>` for state, plus an `AtomicU64` "probe-in-flight" sentinel for the single-probe reservation. Either lock-based or fully lock-free is acceptable; the spec mandates the *behavior*, not the primitive. + +##### Avoiding cross-contamination of the G1 endpoint cache + +The current `mark_endpoint_unavailable()` in `driver/routing/routing_systems.rs:169` and the `endpoint_is_available()` check in `driver/pipeline/operation_pipeline.rs:425` both key the unavailable-endpoints map by `endpoint.url()` — which today returns the **same** per-region URL regardless of whether the request used the G1 or G2 transport. As a result, a G2 transport failure on `westus2` would mark `westus2`'s G1 URL unavailable too, and after the breaker opens the forced-G1 routing would skip the healthy same-region G1 endpoint and fail over cross-region unnecessarily. + +The breaker change MUST address this in one of two ways: + +1. **Preferred**: extend the unavailable-endpoints cache key from `endpoint.url()` to `(endpoint.url(), TransportMode)`, so G2 transport failures only suppress G2 selection on that endpoint and leave G1 selection on the same endpoint unaffected. This also helps non-breaker scenarios — a G2 outage at one endpoint should not block G1 traffic to the same region. +2. **Fallback**: if (1) is too invasive for the breaker change, route G2 transport failures into a separate `unavailable_endpoints_g2` map, and have the breaker read from it. The `unavailable_endpoints` map then sees G1-only failures and is read-by G1 routing only. + +This is a real bug in the current code (independent of the breaker), surfaced by the breaker design but worth fixing on its own merits. + +##### Recovery + +After cooldown `C` (initial recommendation: `C = 5 * 60` seconds, i.e. 5 minutes), the breaker transitions `Open → HalfOpen` on the next operation that observes the elapsed cooldown. In `HalfOpen` state, exactly one operation is permitted to use Gateway 2.0 as a probe: + +- The probe slot is consumed only by an operation that **would otherwise have routed via G2** (i.e. its `endpoint.uses_gateway20(prefer_gateway20)` returns `true` AND the operation is G2-eligible per `is_operation_supported_by_gateway20()`). Operations that are G2-ineligible (e.g. unsupported resource types) do NOT consume the probe slot; they continue routing via G1 as they normally would, and the breaker stays in `HalfOpen` waiting for an eligible operation. +- **Probe succeeds** (the request reaches the G2 proxy and returns any HTTP response, including 4xx/5xx — the path is open by definition): breaker transitions `HalfOpen → Closed`. Routing returns to normal G2 selection. +- **Probe fails with a connectivity-failure signal** (`TransportFailureClass` ∈ {`DnsFailure`, `ConnectRefused`, `ConnectTimeout`, `TlsHandshakeFailure`, `Http2NegotiationFailure`, `PreResponseReset`}): breaker transitions `HalfOpen → Open` again with a fresh `opens_again_at = Instant::now() + C`. Connectivity failure during the probe restarts the cooldown; it does not need to re-meet the trip threshold. +- **Probe fails for any other reason** (HTTP error, auth failure, RNTBD decode error, mid-stream reset, post-send timeout, cancellation): breaker transitions `HalfOpen → Closed`. The probe proved that the network path is open; the failure is unrelated to the firewall hypothesis. Note: this is intentional even for HTTP 401 / auth failures isolated to G2 — operationally those are recoverable HTTP errors, not network blocks. (If we later observe customers in this scenario, we can revisit; the conservative call today is "if bytes flowed, the firewall hypothesis is dead".) + +The probe MUST be the user's next eligible operation, **not** a background ping. Background probing requires us to construct a synthetic request that the G2 proxy will accept, which is fragile to wire-protocol changes; piggy-backing on a real operation has none of those concerns. The tradeoff: an idle client whose firewall has been removed will stay on G1 indefinitely until it issues another operation. This is acceptable — an idle client is not paying any latency cost. + +##### Per-operation override (escape hatch) + +Customers running diagnostic / health-check operations who want to force a G2 attempt even while the breaker is open SHOULD be able to opt in per-operation. *Not* via a public option: if customers reach for this routinely, the breaker is mistuned and we should fix the breaker. Provide it only as a `pub(crate)` flag on `OperationContext` for use by internal diagnostic surfaces (e.g. the `azure_data_cosmos_driver::testing` shim, or a future SDK-level health-check API). Opening this up later is API-additive; locking it down later is not. + +##### Diagnostics requirements + +Every retry attempt MUST surface, in the existing diagnostics envelope, both the breaker state and an explicit suppression-reason enum at the time the attempt was scheduled. Concretely: + +- Add a `gateway20_decision_reason` field (closed enum) to the per-attempt diagnostics struct (`AttemptDiagnostics` in `diagnostics/mod.rs`): + + ```text + enum Gateway20DecisionReason { + Eligible, // request routed via G2 + OperatorDisabled, // options.gateway20_disabled = true + BreakerOpen, // connectivity-failure breaker is open + NoGateway20Endpoints, // account does not advertise G2 endpoints + OperationIneligible, // is_operation_supported_by_gateway20() = false + } + ``` + + These are the four reasons G2 may be skipped plus the "selected G2" verdict; they are mutually exclusive and exhaustive at the routing decision point. Customer support MUST be able to look at a single attempt's diagnostics and know which one applied. +- Add a `breaker: Option` field, where `BreakerSnapshot { state: "closed" | "open" | "half_open", tripped_at: Option, opens_again_at: Option, recent_failures: u32, recent_failure_classes: Vec }`. `recent_failure_classes` is the per-endpoint most-recent classification (capped at e.g. 8 entries) so an operator can verify "yes, all observed failures were `ConnectRefused`" without raw error strings. +- The first attempt that observes `state == "open"` MUST also include the underlying `TransportFailureClass` and a one-line error message per offending endpoint, so a customer support ticket can copy-paste a single diagnostics blob and reach a verdict on "is this a firewall problem?". + +##### Interaction with operator override + +If `options.gateway20_disabled = true`, the breaker is **inert** (it never observes any G2 attempts because none are scheduled). It does not need to be disabled explicitly; this falls out of the implementation. + +If `options.gateway20_disabled = false` (the default), the breaker is active and may flip the effective routing to G1 as described above. + +##### Out of scope (deliberately) + +- **Cross-process sharing.** Each `CosmosClient` instance has its own breaker. We do not persist breaker state to disk or to a shared cache. +- **Per-region breakers.** A single client-scoped breaker is enough for the firewall use case (firewalls are network-scoped, not region-scoped). Per-region tracking is what `unavailable_endpoints` already does at the endpoint level. +- **Auto-disable across the operator override.** If the operator sets `gateway20_disabled = true`, we never try G2 again until they flip it back. The breaker has no opinion on that. + +#### 4.3.4 Test plan + +The following tests are non-negotiable for the breaker to land. They MUST be added in the same change that adds the breaker, not in a follow-up. + +##### Unit tests — driver crate (`tests/gateway20_pipeline_tests.rs` or a sibling) + +Each test exercises the breaker's state machine in isolation, using synthetic `TransportError` values fed through the classifier. None of these require a live network or a fault-injection proxy. + +| # | Scenario | Expected result | +| --- | --- | --- | +| U1 | Single connectivity failure on one G2 endpoint, breaker stays `Closed`. | `RoutingDecision.transport_mode == Gateway20` for the next operation. | +| U2 | `N = 2` connectivity failures on the same endpoint within `W` (multi-endpoint account), breaker stays `Closed`. | `Gateway20` selected. (Trip requires *distinct* endpoints, not repeated failures of one.) | +| U3 | `N = 2` connectivity failures on two distinct endpoints within `W`, breaker trips to `Open`. | Next op routes via `Gateway`; diagnostics report `gateway20_decision_reason = BreakerOpen`. | +| U4 | `Open` → `HalfOpen` after `C` elapses; next G2-eligible op routes via `Gateway20` as the probe. | Probe is scheduled on G2; subsequent ops (before probe completes) still route via `Gateway`. | +| U5 | `HalfOpen` probe receives HTTP 500 from G2 proxy. | Breaker → `Closed`; subsequent ops route via `Gateway20`. (Server error is not a connectivity failure.) | +| U6 | `HalfOpen` probe fails with `ConnectRefused`. | Breaker → `Open`, `opens_again_at = Now + C`; cooldown restarts. | +| U7 | Single-endpoint account; **one** connectivity failure on the sole endpoint. | Breaker stays `Closed`. Per the refined single-endpoint threshold, two failures within `W` are required. | +| U8 | Single-endpoint account; **two** connectivity failures on the sole endpoint within `W`. | Breaker → `Open`. | +| U9 | `options.gateway20_disabled = true`; synthetic connectivity failures fed to the breaker. | Breaker remains `Closed` (no G2 attempts are scheduled to observe). Diagnostics report `gateway20_decision_reason = OperatorDisabled`. | +| U10 | Non-connectivity errors only (`HttpResponse`, `AuthFailure`, `DecodeError`, `MidStreamFailure`, `PostSendTimeout`, `OperationCancelled`). | Breaker stays `Closed`; `recent_failures` does not increment regardless of count. | +| U11 | Sliding-window expiry: fail endpoint A; advance clock by `> W`; fail endpoint B. | Breaker stays `Closed` — the A failure aged out before B was observed. | +| U12 | Concurrent observations: fan out 100 simultaneous failure observations across `N` endpoints from many tasks. | Exactly one `Closed → Open` transition; exactly one warn-log line; `tripped_at` is set exactly once. | +| U13 | Concurrent probe selection: cooldown elapses; 100 simultaneous operations arrive. | Exactly one operation routes via G2 (the probe); the other 99 route via `Gateway`. | +| U14 | Topology refresh: account previously advertised endpoints {A, B}; both fail and breaker trips; account refresh shrinks topology to {A}. | Breaker stays `Open` until cooldown — the prior trip is not invalidated by topology changes. (Documents intentional behavior; alternative is debatable but inverting it requires evidence.) | +| U15 | G2-ineligible operations during `HalfOpen` do not consume the probe. | An ineligible op observes `HalfOpen`, routes via `Gateway` per its own ineligibility, and the breaker remains in `HalfOpen` for the next eligible op. | +| U16 | Mid-operation breaker behavior: an in-flight operation on G2 observes `N` connectivity failures across its own retries and trips the breaker. | The current operation continues retrying on G2 until its retry budget is exhausted; only the *next* operation sees `BreakerOpen` and routes via G1. | +| U17 | Per-`TransportFailureClass` classifier coverage (one test per variant). | `DnsFailure`, `ConnectRefused`, `ConnectTimeout`, `TlsHandshakeFailure`, `Http2NegotiationFailure`, `PreResponseReset` increment `recent_failures`; `HttpResponse`, `PostSendTimeout`, `MidStreamFailure`, `AuthFailure`, `DecodeError`, `OperationCancelled` do not. | + +##### Fault-injection tests — emulator (`tests/emulator_tests/cosmos_fault_injection.rs`) + +Inject the listed failures via the existing fault-injection rules (extended in Phase 6) on a real-but-emulated G2 transport. Each test asserts the operation outcome AND the diagnostics envelope. + +| # | Injected fault | Expected result | +| --- | --- | --- | +| F1 | Inject `connect_refused` on every G2 endpoint, enough times to meet the trip threshold. | First operations fail with G2 transport error; the operation issued after the trip succeeds via G1; diagnostics on the post-trip attempt show `gateway20_decision_reason = BreakerOpen` and `breaker.state = "open"`. | +| F2 | Inject TLS handshake failure on every G2 endpoint. | Same as F1; classifier MUST produce `TlsHandshakeFailure`. | +| F3 | Inject HTTP/2 negotiation failure (server returns HTTP/1.x bytes). | Same as F1; classifier MUST produce `Http2NegotiationFailure`. | +| F4 | Inject HTTP 503 (server-side, not transport-side) on every G2 endpoint. | Operations retry per existing 503 policy and may eventually fail; breaker MUST stay `Closed` (the network is fine); diagnostics show `gateway20_decision_reason = Eligible`. | +| F5 | Inject `connect_refused` on G2; after `C` elapses, stop injecting. | Probe operation succeeds via G2; subsequent operations route via G2; diagnostics show breaker recovered to `Closed`. | +| F6 | Inject `connect_refused` on G2 *and* `service_unavailable` on G1. | Operations fail (no transport works); breaker is `Open`; diagnostics make both failure causes legible. | +| F7 | Inject `connect_refused` on G2 westus2 only; verify subsequent G1 routing to westus2 succeeds (does not cross-region failover). | Once the breaker is open, a G1-routed op against westus2 MUST succeed — proving G2 transport failures did not poison the G1 unavailable-endpoint cache. | +| F8 | Trip the breaker, then refresh the account metadata mid-cooldown to add/remove G2 endpoints. | Breaker stays `Open` until cooldown regardless of topology changes; once cooldown elapses, the probe runs against the current topology (not the trip-time topology). | +| F9 | Trip the breaker while regional failover state is already active for an unrelated reason. | Breaker-forced G1 routing respects the existing regional failover decision (does not undo it); diagnostics surface both states. | + +##### Live tests — Gateway 2.0 live pipeline (`tests/emulator_tests/gateway20_e2e.rs`) + +The live job already authenticates against a dedicated Gateway 2.0-enabled account (per Q2). Add the following tests, marked with the existing live-only attribute: + +| # | Scenario | Expected result | +| --- | --- | --- | +| L1 | Normal operation against the Gateway 2.0 account. | Diagnostics show `gateway20_decision_reason = Eligible`, `breaker.state = "closed"`. | +| L2 | Set `options.gateway20_disabled = true`, run the same workload. | Diagnostics show `gateway20_decision_reason = OperatorDisabled`, breaker remains inert (`recent_failures = 0`). | + +A "firewall in CI" test (force outbound block to G2 hostnames) is **not** added to the standard live job because it requires container-network manipulation that the SDK live pipeline is not currently set up to perform. Java has the same gap (its `Http2ConnectionLifecycleTests` is in the explicitly-excluded `manual-thinclient-network-delay` group). Track as a separate manual test ticket; do not block the breaker landing on it. + +#### 4.3.5 Open sub-questions for this section + +- **Q4a** — Should the breaker fire the first probe **automatically** in the background after `C` instead of waiting for the next user op? (Decision: no, as argued above. Captured here in case the team revisits.) +- **Q4b** — Should the trip threshold `N` and window `W` be tunable via env var? (Default position: no — these are SDK-level safety parameters and customers should not tune them in production. Re-open if the live bake-in shows the defaults are wrong for a specific account topology.) +- **Q4c** — Should the breaker be allowed to *escalate* (e.g. mark the entire account as G2-incompatible in the account-cache so a fresh client picks it up)? (Decision: no, scope creep. Each client makes its own observation.) + --- ## 5. Rust Implementation Plan @@ -628,3 +846,4 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails and the existing retry policies handle it. - **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable Gateway 2.0 endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned Gateway 2.0 account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. - **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`GATEWAY20_RANGE_MIN`, `GATEWAY20_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. +- **Q4 — Connectivity-failure G2 → G1 fallback**: _Specified, not yet implemented_. See §4.3 for the full design and test plan. The breaker is **not** part of the initial Gateway 2.0 enablement; it is a follow-up that must land before we can claim "Gateway 2.0 default-on is safe for customers behind firewalls". Open sub-questions for tuning live in §4.3.5. From 55cb1c2c47a0d737988aab25b5950592f139ef49 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 11 May 2026 14:25:53 -0700 Subject: [PATCH 47/48] =?UTF-8?q?Flip=20=C2=A74.3=20G2=20connectivity-fail?= =?UTF-8?q?ure=20recommendation=20to=20fail-fast?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The prior commit (45595560d) recommended Option B (client-scoped circuit breaker / auto-fallback). Per follow-up review the recommendation is reversed to Option A (fail-fast, parity with Java/.NET). Reasons documented in §4.3.2: - Latency surprise: silently switching transport modes mid-workload changes the latency profile in ways the customer didn't ask for. G2 has different latency characteristics than G1; auto-degrading hides a regression customers cannot attribute. - G2 latency SLA: operational guarantees we publish for G2 are tied to G2 staying selected. Auto-degrading violates that contract. - Hidden state can mask intermittent infrastructure issues that the customer needs to see and react to (firewall mis-configuration is exactly the kind of thing that should bubble up loudly). - Java/.NET parity: neither has auto-fallback. Diverging without strong evidence increases the support matrix. - API-additive deferral: Option B can ship later as a connection-pool option (gateway20_auto_fallback: bool) defaulting to false. Recommendation captured under §4.3.5 Q4a; reopen only with telemetry from the live bake-in showing widespread firewall trouble that the fail-fast hint does not mitigate. What stays from the prior breaker spec (still required under fail-fast): - Structured TransportFailureClass enum classified at error-construction time in driver/transport. Still needed so the SDK can decide whether to attach the firewall hint to the consolidated error. - Endpoint-cache contamination subsection. Pre-existing bug in routing_systems.rs:169 — independent of fail-fast vs breaker, and required for the operator's gateway20_disabled = true opt-out to actually recover affected per-region G1 endpoints. - Gateway20DecisionReason enum on per-attempt diagnostics, minus the BreakerOpen variant. What's new under fail-fast: - Explicit error contract (§4.3.3 'Error contract: when to attach the firewall hint'): hint attached IFF every observed TransportFailureClass is hint-eligible. Mixed outcomes (one ConnectRefused + one HTTP 503) reject the firewall hypothesis and use standard messaging. - Test plan rewritten around fail-fast: 13 unit tests (per-variant classifier coverage; per-class consolidated-error tests; cache isolation; per-decision-reason coverage), 7 fault-injection tests (F7 specifically validates that gateway20_disabled = true recovers the affected per-region G1 endpoints end-to-end), 2 live tests. - Removed: trip-threshold tuning (N/W/C constants), concurrency model, HalfOpen probe semantics, trip-action subsection, BreakerSnapshot diagnostics field. TOC and §6 Q4 entries updated to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/GATEWAY_20_SPEC.md | 206 ++++++++---------- 1 file changed, 85 insertions(+), 121 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md index c6fd0ad8edb..477e9adf003 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md +++ b/sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md @@ -14,7 +14,7 @@ 4. [Retry Behavior](#4-retry-behavior) - 4.1 [HTTP 449 (Retry-With)](#41-http-449-retry-with--dedicated-policy-separate-from-410gone) - 4.2 [HTTP 404/1002 (READ_SESSION_NOT_AVAILABLE)](#42-http-404-not-found-with-sub-status-1002-read_session_not_available) - - 4.3 [Connectivity-failure circuit breaker (G2 → G1 fallback)](#43-connectivity-failure-circuit-breaker-g2--g1-fallback) + - 4.3 [Fail-fast on Gateway 2.0 transport failures](#43-fail-fast-on-gateway-20-transport-failures) 5. [Rust Implementation Plan](#5-rust-implementation-plan) 6. [Open Questions](#6-open-questions) @@ -152,7 +152,7 @@ These rules apply uniformly to V1 (HTTP) and V2 (RNTBD) — the retry policy ope Beyond 449 and 404/1002, Gateway 2.0 follows the timeout/408 handling defined in `TRANSPORT_PIPELINE_SPEC.md` — no Gateway-2.0-specific override is introduced. -### 4.3 Connectivity-failure circuit breaker (G2 → G1 fallback) +### 4.3 Fail-fast on Gateway 2.0 transport failures > **Status**: Design proposal — not yet implemented. Implementation is tracked as a follow-up to the Gateway 2.0 enablement work; this section reserves the design surface and the test plan so the implementation lands behaviorally consistent with the rest of the routing rules. @@ -160,31 +160,51 @@ Beyond 449 and 404/1002, Gateway 2.0 follows the timeout/408 handling defined in Gateway 2.0 endpoints are advertised by the account on a different host (and may be advertised on a non-443 port — the test fixture in `routing_systems.rs` uses `:444`). Some enterprise networks block outbound TCP to non-443 ports, or block the Gateway 2.0 hostnames specifically. In those environments **every** Gateway 2.0 attempt from the client will fail at the transport layer (TCP refused, TCP timeout, TLS handshake failure, or HTTP/2 negotiation failure) while Gateway V1 on the *standard* gateway host:443 still works. -Today the `endpoint_is_available()` check in `operation_pipeline.rs` skips a *single* G2 endpoint after we mark it `UnavailableReason::TransportError`, but the retry then immediately picks the *next* G2 endpoint. In a blanket-firewall scenario all G2 endpoints have the same outcome, so the operation exhausts its retry budget on G2 and fails — even though G1 would have succeeded immediately. +Today the `endpoint_is_available()` check in `operation_pipeline.rs` skips a *single* G2 endpoint after we mark it `UnavailableReason::TransportError`, but the retry then immediately picks the *next* G2 endpoint. In a blanket-firewall scenario all G2 endpoints have the same outcome, so the operation exhausts its retry budget on G2 and fails — even though G1 would have succeeded immediately. Today the customer sees a generic transport error with no clear remediation. -Java and .NET both choose **fail-fast** here: `ThinClientStoreModel` extends `RxGatewayStoreModel` and inherits the standard regional retry stack, with **no automatic G2 → G1 fallback**. (Confirmed by inspection of `Azure/azure-sdk-for-java/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java` and the equivalent paths in `Azure/azure-cosmos-dotnet-v3`.) The Rust SDK is in a position to do better because it controls the per-attempt transport selection in `resolve_endpoint`, but doing so requires explicit policy and explicit telemetry so customers and SREs can reason about why their client is on G1. +Java and .NET both choose **fail-fast** here: `ThinClientStoreModel` extends `RxGatewayStoreModel` and inherits the standard regional retry stack, with **no automatic G2 → G1 fallback**. (Confirmed by inspection of `Azure/azure-sdk-for-java/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java` and the equivalent paths in `Azure/azure-cosmos-dotnet-v3`.) #### 4.3.2 Design space -**Option A — Fail-fast (parity with Java/.NET).** -Keep current behavior. A single attempt against an unreachable G2 endpoint fails, the endpoint is marked `TransportError`, the retry tries the next region's G2 endpoint, and the operation eventually fails with a transport error if all G2 regions are unreachable. The operator must then explicitly set `gateway20_disabled = true` to route through G1. +**Option A — Fail-fast (recommended; parity with Java/.NET).** +Keep the regional-retry behavior. A single attempt against an unreachable G2 endpoint fails, the endpoint is marked `TransportError`, the retry tries the next region's G2 endpoint, and the operation eventually fails with a transport error if all G2 regions are unreachable. The error and diagnostics surface a clear, actionable hint pointing the operator at `options.gateway20_disabled = true`. The operator must then explicitly opt out to route through G1. -Pros: simple, matches Java/.NET semantics, no new state. Cons: customers in firewalled networks see a hard failure on every operation until they manually opt out, which breaks the "default on" rollout story for Gateway 2.0. +Pros: simple, matches Java/.NET semantics, no new state, no concurrency contract, no probe semantics, no recovery logic. Customers behind firewalls get a single deterministic verdict and a one-line remediation. -**Option B — Connectivity-failure circuit breaker (recommended).** -Add a **client-scoped circuit breaker** that observes transport-layer failures across G2 endpoints and, after a trip threshold is met, suppresses Gateway 2.0 routing for subsequent operations on that client. This is functionally equivalent to flipping `gateway20_suppressed = true` at runtime, and deliberately *not* mutating the operator-controlled `options.gateway20_disabled` setting (which stays customer-owned). +Cons: customers in firewalled networks see a hard failure on every operation until they manually opt out. We mitigate this with a discoverable error message and connection-pool documentation. -The trip is **client-scoped** (not account-scoped, not global) so that two `CosmosClient` instances pointing at different accounts — or even the same account from a different network namespace — do not contaminate each other. +**Option B — Connectivity-failure circuit breaker (alternative; not taken at this time).** +Add a client-scoped circuit breaker that observes transport-layer failures across G2 endpoints and, after a trip threshold is met, suppresses Gateway 2.0 routing for subsequent operations on that client. Functionally equivalent to flipping `gateway20_suppressed = true` at runtime. -#### 4.3.3 Recommended design (Option B) +Reasons not taken in this iteration: -##### Trip condition +- **Latency surprise**: silently switching transport modes mid-workload changes the latency profile in ways the customer didn't ask for. Gateway 2.0 has different latency characteristics than Gateway V1; customers who are tuning their workload around G2 latency would experience a hidden regression they cannot attribute. +- **G2 latency SLA**: the operational guarantees we publish for G2 are tied to G2 staying selected. Auto-degrading violates that contract for the affected client. +- **Hidden state** can mask intermittent infrastructure issues that the customer needs to see and react to (firewall mis-configuration is exactly the kind of thing that should bubble up loudly). +- **Java/.NET parity**: neither has auto-fallback. Diverging without strong evidence increases the support matrix and customer confusion when migrating SDKs. +- **API-additive deferral**: Option B can be added later as a connection-pool option (e.g. `gateway20_auto_fallback: bool`) defaulting to `false`. The fail-fast contract does not foreclose that path. -The breaker MUST trip only on signals that strongly correlate with a network/firewall block, not on signals that an outage at one G2 endpoint is producing. The classifier produces a closed enum (`TransportFailureClass`) that the breaker reads; downstream code MUST NOT pattern-match on `io::ErrorKind`, error strings, or `azure_core::Error` shapes: +Option B is parked under §4.3.5 Q4a. Reopen only with telemetry from the live bake-in showing widespread firewall scenarios that the fail-fast hint does not mitigate. + +#### 4.3.3 Recommended design (Option A) + +##### Behavior summary + +1. Per-region retry within G2 follows the existing regional retry stack: `UnavailableReason::TransportError` marks the failed endpoint, and the next G2 endpoint is tried. +2. When the operation has tried at least one G2 endpoint per region in `thinClientReadableLocations` and **every** observed transport failure is connectivity-class, the operation MUST fail with a single consolidated error that: + - Identifies the per-attempt `TransportFailureClass`. + - Lists the endpoints attempted. + - Includes the explicit guidance: "All Gateway 2.0 endpoints failed with connectivity-class errors. Set `options.gateway20_disabled = true` to disable Gateway 2.0 if your network blocks the Gateway 2.0 hostnames or ports." +3. The error MUST NOT auto-fallback to Gateway V1. The operator picks the transport explicitly. +4. Mixed-failure outcomes (some G2 attempts return HTTP responses, others fail at the transport layer) MUST NOT include the firewall hint — at least one path proved open and the firewall hypothesis is rejected. + +##### Trip signal classification + +We still need a structured classifier so the SDK can decide whether to attach the firewall hint. Use a closed `TransportFailureClass` enum produced at error-construction time in `cosmos_transport_client.rs` and `sharded_transport.rs` — not a late `is_connectivity_failure() -> bool` over an opaque error. ```text enum TransportFailureClass { - // counted as connectivity failures + // hint-eligible (network/firewall-class signals) DnsFailure, ConnectRefused, ConnectTimeout, @@ -195,7 +215,7 @@ enum TransportFailureClass { // structurally prove this; otherwise classify // as `MidStreamFailure` - // explicitly NOT counted + // NOT hint-eligible HttpResponse, // the proxy returned any HTTP response (200..599) PostSendTimeout, // timeout fired after a request was on the wire MidStreamFailure, // GOAWAY / stream reset / RST after at least one @@ -206,133 +226,79 @@ enum TransportFailureClass { } ``` -The classifier MUST live where the error is constructed — `cosmos_transport_client.rs` and `sharded_transport.rs` — so it produces a structured value at the source. Adding a late `is_connectivity_failure(&self) -> bool` accessor over an opaque error is **not** acceptable; it forces string-matching on reqwest/hyper error chains and produces non-deterministic trip/no-trip behavior. +The classifier MUST live where the error is constructed so it produces a structured value at the source. Adding a late `is_connectivity_failure(&self) -> bool` accessor over an opaque error is **not** acceptable; it forces string-matching on reqwest/hyper error chains and produces non-deterministic verdicts. -Notes on edge cases the rubber-duck pass surfaced: +Edge cases the rubber-duck pass surfaced: - **Timeout that fires during connect / TLS / HTTP/2 setup** counts as `ConnectTimeout` / `TlsHandshakeFailure` / `Http2NegotiationFailure` respectively (not `PostSendTimeout`). The transport layer MUST distinguish these by tracking which phase of the connection lifecycle it is in. -- **`Connection reset by peer` before any response headers** counts as `PreResponseReset` ONLY when the transport can prove (e.g. via a per-connection state flag) that no response frame was observed; otherwise classify as `MidStreamFailure`. When in doubt, do NOT count. -- **Mid-stream HTTP/2 reset / GOAWAY / stream reset after at least one response frame** is `MidStreamFailure` and does NOT count. The path is open by definition; the proxy is unhealthy. -- **HTTP responses of any status** are `HttpResponse` and do NOT count. A G2 proxy returning 401/403/500/503/etc. proves the network is open. - -##### Trip threshold - -Two separate thresholds — one for multi-endpoint accounts, one for single-endpoint accounts — both expressed against a sliding window of `W` wall-clock seconds: - -- **Multi-endpoint accounts** (`thinClientReadableLocations.len() >= 2`): trip when `N` distinct G2 endpoints (distinct host:port pairs) each raise `>= 1` connectivity failure within `W`. Recommended initial `N = 2`, `W = 60s`. The threshold MUST NOT scale up for accounts with more endpoints — `N = 2` already proves network-scoped trouble; larger `N` only delays escape from the firewall scenario. -- **Single-endpoint accounts** (`thinClientReadableLocations.len() == 1`): trip when the sole endpoint raises `>= 2` connectivity failures within `W`. Two failures on the same endpoint are required to defend against false trips from a single transient timeout. Recommended `W = 60s` (same window). - -The implementation MUST expose `N`, `W`, and `C` as `pub(crate) const` so they can be tuned in one place. Tuning during the live-test bake-in is tracked under §4.3.5 Q4b. - -##### Trip action and operation-scope contract - -The breaker affects **subsequent logical operations**, not the current in-flight operation. Per §3, transport mode is decided once per logical operation in `resolve_endpoint`, attached to `OperationContext`, and inherited by all retries of that operation. The breaker preserves this contract by being read **only at operation start** (when `RoutingDecision` is computed for the first attempt). Mid-operation retries do NOT re-read breaker state. - -This means: an operation that is already executing on G2 may observe `N` connectivity failures across its own retries and trip the breaker, but it WILL continue retrying on G2 until its own retry budget is exhausted. Only the *next* logical operation (and all later ones) sees the open breaker and routes via G1. - -When the threshold is hit, the breaker: +- **`Connection reset by peer` before any response headers** counts as `PreResponseReset` ONLY when the transport can prove (e.g. via a per-connection state flag) that no response frame was observed; otherwise classify as `MidStreamFailure`. When in doubt, do NOT classify as connectivity. +- **Mid-stream HTTP/2 reset / GOAWAY / stream reset after at least one response frame** is `MidStreamFailure` and is NOT hint-eligible — the path is open by definition, the proxy is unhealthy. +- **HTTP responses of any status** are `HttpResponse` and are NOT hint-eligible. A G2 proxy returning 401/403/500/503/etc. proves the network is open. -1. Atomically transitions a per-client `breaker_state: Arc` from `Closed` to `Open { tripped_at: Instant, opens_again_at: Instant }`. Concurrent failures that all observe `Closed` MUST result in exactly one `Closed → Open` transition (use compare-and-swap or an internal mutex; document which in the implementation). -2. Causes `resolve_endpoint` to compute `use_gateway20 = false` at operation start when `breaker_state.is_open()`. Implementation: `RoutingDecision::transport_mode = TransportMode::Gateway` is forced, and the suppression reason is recorded in diagnostics (see "Diagnostics requirements"). -3. Emits exactly **one** `WARN`-level log line at trip time with: client id, observed endpoints, the underlying `TransportFailureClass` of the most-recent failure per endpoint, and `opens_again_at`. Subsequent operations during the open window MUST NOT emit per-operation warnings (otherwise a busy client floods the log). +##### Error contract: when to attach the firewall hint -The breaker MUST NOT mutate `options.gateway20_disabled`. The setting remains the only operator-visible control; the breaker is an internal runtime override that decays. +The SDK MUST attach the "set `gateway20_disabled = true`" hint to the consolidated error if and only if **all** of the following hold: -##### Concurrency model +1. The operation routed through G2 (was G2-eligible per `is_operation_supported_by_gateway20()` and the account advertises G2 endpoints). +2. The operation tried at least one G2 endpoint per region in `thinClientReadableLocations` (i.e. exhausted regional retry). +3. **Every** observed `TransportFailureClass` in the per-attempt diagnostics is hint-eligible (one of `DnsFailure`, `ConnectRefused`, `ConnectTimeout`, `TlsHandshakeFailure`, `Http2NegotiationFailure`, `PreResponseReset`). +4. **No** attempt observed a non-hint-eligible class (`HttpResponse`, `PostSendTimeout`, `MidStreamFailure`, `AuthFailure`, `DecodeError`, `OperationCancelled`). -`AtomicBreakerState` MUST be safe for concurrent reads from many operation pipelines and concurrent failure observations from many transport callbacks. The implementation MUST guarantee: - -- Exactly one `Closed → Open` transition per trip event (concurrent failures past the threshold do not produce duplicate warn-log lines or duplicate `tripped_at` timestamps). -- Exactly one `Open → HalfOpen` transition per cooldown elapse (concurrent operations arriving after `opens_again_at` do not all become probes; only the first one wins the probe slot via CAS). -- Exactly one `HalfOpen → Closed` or `HalfOpen → Open` transition per probe outcome (the probe's terminal status drives a single state mutation; in-flight operations that observed `HalfOpen` between probe-start and probe-end route via `Gateway` and do NOT race the probe's resolution). - -Recommended primitive: `Arc>` for state, plus an `AtomicU64` "probe-in-flight" sentinel for the single-probe reservation. Either lock-based or fully lock-free is acceptable; the spec mandates the *behavior*, not the primitive. +If any single attempt observed a non-hint-eligible class, the consolidated error uses standard messaging — the firewall hypothesis is rejected. ##### Avoiding cross-contamination of the G1 endpoint cache -The current `mark_endpoint_unavailable()` in `driver/routing/routing_systems.rs:169` and the `endpoint_is_available()` check in `driver/pipeline/operation_pipeline.rs:425` both key the unavailable-endpoints map by `endpoint.url()` — which today returns the **same** per-region URL regardless of whether the request used the G1 or G2 transport. As a result, a G2 transport failure on `westus2` would mark `westus2`'s G1 URL unavailable too, and after the breaker opens the forced-G1 routing would skip the healthy same-region G1 endpoint and fail over cross-region unnecessarily. - -The breaker change MUST address this in one of two ways: - -1. **Preferred**: extend the unavailable-endpoints cache key from `endpoint.url()` to `(endpoint.url(), TransportMode)`, so G2 transport failures only suppress G2 selection on that endpoint and leave G1 selection on the same endpoint unaffected. This also helps non-breaker scenarios — a G2 outage at one endpoint should not block G1 traffic to the same region. -2. **Fallback**: if (1) is too invasive for the breaker change, route G2 transport failures into a separate `unavailable_endpoints_g2` map, and have the breaker read from it. The `unavailable_endpoints` map then sees G1-only failures and is read-by G1 routing only. - -This is a real bug in the current code (independent of the breaker), surfaced by the breaker design but worth fixing on its own merits. - -##### Recovery +The current `mark_endpoint_unavailable()` in `driver/routing/routing_systems.rs:169` and the `endpoint_is_available()` check in `driver/pipeline/operation_pipeline.rs:425` both key the unavailable-endpoints map by `endpoint.url()` — which today returns the **same** per-region URL regardless of whether the request used the G1 or G2 transport. As a result, a G2 transport failure on `westus2` would mark `westus2`'s G1 URL unavailable too. After the operator opts out via `gateway20_disabled = true` (the action this spec asks them to take), the next G1 operation would skip the healthy same-region G1 endpoint and fail over cross-region unnecessarily. -After cooldown `C` (initial recommendation: `C = 5 * 60` seconds, i.e. 5 minutes), the breaker transitions `Open → HalfOpen` on the next operation that observes the elapsed cooldown. In `HalfOpen` state, exactly one operation is permitted to use Gateway 2.0 as a probe: +The fail-fast change MUST address this in one of two ways: -- The probe slot is consumed only by an operation that **would otherwise have routed via G2** (i.e. its `endpoint.uses_gateway20(prefer_gateway20)` returns `true` AND the operation is G2-eligible per `is_operation_supported_by_gateway20()`). Operations that are G2-ineligible (e.g. unsupported resource types) do NOT consume the probe slot; they continue routing via G1 as they normally would, and the breaker stays in `HalfOpen` waiting for an eligible operation. -- **Probe succeeds** (the request reaches the G2 proxy and returns any HTTP response, including 4xx/5xx — the path is open by definition): breaker transitions `HalfOpen → Closed`. Routing returns to normal G2 selection. -- **Probe fails with a connectivity-failure signal** (`TransportFailureClass` ∈ {`DnsFailure`, `ConnectRefused`, `ConnectTimeout`, `TlsHandshakeFailure`, `Http2NegotiationFailure`, `PreResponseReset`}): breaker transitions `HalfOpen → Open` again with a fresh `opens_again_at = Instant::now() + C`. Connectivity failure during the probe restarts the cooldown; it does not need to re-meet the trip threshold. -- **Probe fails for any other reason** (HTTP error, auth failure, RNTBD decode error, mid-stream reset, post-send timeout, cancellation): breaker transitions `HalfOpen → Closed`. The probe proved that the network path is open; the failure is unrelated to the firewall hypothesis. Note: this is intentional even for HTTP 401 / auth failures isolated to G2 — operationally those are recoverable HTTP errors, not network blocks. (If we later observe customers in this scenario, we can revisit; the conservative call today is "if bytes flowed, the firewall hypothesis is dead".) +1. **Preferred**: extend the unavailable-endpoints cache key from `endpoint.url()` to `(endpoint.url(), TransportMode)`, so G2 transport failures only suppress G2 selection on that endpoint and leave G1 selection on the same endpoint unaffected. This also helps non-firewall scenarios — a G2 outage at one endpoint should not block G1 traffic to the same region. +2. **Fallback**: route G2 transport failures into a separate `unavailable_endpoints_g2` map; the existing `unavailable_endpoints` map then sees G1-only failures and is read by G1 routing only. -The probe MUST be the user's next eligible operation, **not** a background ping. Background probing requires us to construct a synthetic request that the G2 proxy will accept, which is fragile to wire-protocol changes; piggy-backing on a real operation has none of those concerns. The tradeoff: an idle client whose firewall has been removed will stay on G1 indefinitely until it issues another operation. This is acceptable — an idle client is not paying any latency cost. - -##### Per-operation override (escape hatch) - -Customers running diagnostic / health-check operations who want to force a G2 attempt even while the breaker is open SHOULD be able to opt in per-operation. *Not* via a public option: if customers reach for this routinely, the breaker is mistuned and we should fix the breaker. Provide it only as a `pub(crate)` flag on `OperationContext` for use by internal diagnostic surfaces (e.g. the `azure_data_cosmos_driver::testing` shim, or a future SDK-level health-check API). Opening this up later is API-additive; locking it down later is not. +This is a real bug in the current code (independent of the fail-fast decision), surfaced by this design but worth fixing on its own merits — and required for the operator's `gateway20_disabled = true` opt-out to actually recover affected per-region G1 endpoints. ##### Diagnostics requirements -Every retry attempt MUST surface, in the existing diagnostics envelope, both the breaker state and an explicit suppression-reason enum at the time the attempt was scheduled. Concretely: +Every retry attempt MUST surface in the diagnostics envelope: -- Add a `gateway20_decision_reason` field (closed enum) to the per-attempt diagnostics struct (`AttemptDiagnostics` in `diagnostics/mod.rs`): +- A `gateway20_decision_reason` field (closed enum) recording why this attempt did or did not route via G2: ```text enum Gateway20DecisionReason { Eligible, // request routed via G2 OperatorDisabled, // options.gateway20_disabled = true - BreakerOpen, // connectivity-failure breaker is open NoGateway20Endpoints, // account does not advertise G2 endpoints OperationIneligible, // is_operation_supported_by_gateway20() = false } ``` - These are the four reasons G2 may be skipped plus the "selected G2" verdict; they are mutually exclusive and exhaustive at the routing decision point. Customer support MUST be able to look at a single attempt's diagnostics and know which one applied. -- Add a `breaker: Option` field, where `BreakerSnapshot { state: "closed" | "open" | "half_open", tripped_at: Option, opens_again_at: Option, recent_failures: u32, recent_failure_classes: Vec }`. `recent_failure_classes` is the per-endpoint most-recent classification (capped at e.g. 8 entries) so an operator can verify "yes, all observed failures were `ConnectRefused`" without raw error strings. -- The first attempt that observes `state == "open"` MUST also include the underlying `TransportFailureClass` and a one-line error message per offending endpoint, so a customer support ticket can copy-paste a single diagnostics blob and reach a verdict on "is this a firewall problem?". - -##### Interaction with operator override + These four cases are mutually exclusive and exhaustive at the routing decision point. Customer support MUST be able to look at a single attempt's diagnostics and know which one applied. -If `options.gateway20_disabled = true`, the breaker is **inert** (it never observes any G2 attempts because none are scheduled). It does not need to be disabled explicitly; this falls out of the implementation. +- For attempts that produced a transport error: the `TransportFailureClass` of the error. -If `options.gateway20_disabled = false` (the default), the breaker is active and may flip the effective routing to G1 as described above. - -##### Out of scope (deliberately) - -- **Cross-process sharing.** Each `CosmosClient` instance has its own breaker. We do not persist breaker state to disk or to a shared cache. -- **Per-region breakers.** A single client-scoped breaker is enough for the firewall use case (firewalls are network-scoped, not region-scoped). Per-region tracking is what `unavailable_endpoints` already does at the endpoint level. -- **Auto-disable across the operator override.** If the operator sets `gateway20_disabled = true`, we never try G2 again until they flip it back. The breaker has no opinion on that. +The consolidated operation error MUST cite the per-attempt diagnostics so the firewall hypothesis can be verified from a single trace blob. #### 4.3.4 Test plan -The following tests are non-negotiable for the breaker to land. They MUST be added in the same change that adds the breaker, not in a follow-up. +The following tests are non-negotiable for the fail-fast contract to land. They MUST be added in the same change that adds the `TransportFailureClass` classifier and the firewall-hint contract. ##### Unit tests — driver crate (`tests/gateway20_pipeline_tests.rs` or a sibling) -Each test exercises the breaker's state machine in isolation, using synthetic `TransportError` values fed through the classifier. None of these require a live network or a fault-injection proxy. - | # | Scenario | Expected result | | --- | --- | --- | -| U1 | Single connectivity failure on one G2 endpoint, breaker stays `Closed`. | `RoutingDecision.transport_mode == Gateway20` for the next operation. | -| U2 | `N = 2` connectivity failures on the same endpoint within `W` (multi-endpoint account), breaker stays `Closed`. | `Gateway20` selected. (Trip requires *distinct* endpoints, not repeated failures of one.) | -| U3 | `N = 2` connectivity failures on two distinct endpoints within `W`, breaker trips to `Open`. | Next op routes via `Gateway`; diagnostics report `gateway20_decision_reason = BreakerOpen`. | -| U4 | `Open` → `HalfOpen` after `C` elapses; next G2-eligible op routes via `Gateway20` as the probe. | Probe is scheduled on G2; subsequent ops (before probe completes) still route via `Gateway`. | -| U5 | `HalfOpen` probe receives HTTP 500 from G2 proxy. | Breaker → `Closed`; subsequent ops route via `Gateway20`. (Server error is not a connectivity failure.) | -| U6 | `HalfOpen` probe fails with `ConnectRefused`. | Breaker → `Open`, `opens_again_at = Now + C`; cooldown restarts. | -| U7 | Single-endpoint account; **one** connectivity failure on the sole endpoint. | Breaker stays `Closed`. Per the refined single-endpoint threshold, two failures within `W` are required. | -| U8 | Single-endpoint account; **two** connectivity failures on the sole endpoint within `W`. | Breaker → `Open`. | -| U9 | `options.gateway20_disabled = true`; synthetic connectivity failures fed to the breaker. | Breaker remains `Closed` (no G2 attempts are scheduled to observe). Diagnostics report `gateway20_decision_reason = OperatorDisabled`. | -| U10 | Non-connectivity errors only (`HttpResponse`, `AuthFailure`, `DecodeError`, `MidStreamFailure`, `PostSendTimeout`, `OperationCancelled`). | Breaker stays `Closed`; `recent_failures` does not increment regardless of count. | -| U11 | Sliding-window expiry: fail endpoint A; advance clock by `> W`; fail endpoint B. | Breaker stays `Closed` — the A failure aged out before B was observed. | -| U12 | Concurrent observations: fan out 100 simultaneous failure observations across `N` endpoints from many tasks. | Exactly one `Closed → Open` transition; exactly one warn-log line; `tripped_at` is set exactly once. | -| U13 | Concurrent probe selection: cooldown elapses; 100 simultaneous operations arrive. | Exactly one operation routes via G2 (the probe); the other 99 route via `Gateway`. | -| U14 | Topology refresh: account previously advertised endpoints {A, B}; both fail and breaker trips; account refresh shrinks topology to {A}. | Breaker stays `Open` until cooldown — the prior trip is not invalidated by topology changes. (Documents intentional behavior; alternative is debatable but inverting it requires evidence.) | -| U15 | G2-ineligible operations during `HalfOpen` do not consume the probe. | An ineligible op observes `HalfOpen`, routes via `Gateway` per its own ineligibility, and the breaker remains in `HalfOpen` for the next eligible op. | -| U16 | Mid-operation breaker behavior: an in-flight operation on G2 observes `N` connectivity failures across its own retries and trips the breaker. | The current operation continues retrying on G2 until its retry budget is exhausted; only the *next* operation sees `BreakerOpen` and routes via G1. | -| U17 | Per-`TransportFailureClass` classifier coverage (one test per variant). | `DnsFailure`, `ConnectRefused`, `ConnectTimeout`, `TlsHandshakeFailure`, `Http2NegotiationFailure`, `PreResponseReset` increment `recent_failures`; `HttpResponse`, `PostSendTimeout`, `MidStreamFailure`, `AuthFailure`, `DecodeError`, `OperationCancelled` do not. | +| U1 | `TransportFailureClass` classifier per-variant coverage. | Each `io::Error` / `reqwest::Error` shape produces the correct closed-enum variant at construction time. | +| U2 | Single connectivity failure on one G2 endpoint, account has multiple regions. | Operation succeeds via the next region's G2 endpoint (existing regional retry); no firewall hint surfaced. | +| U3 | All G2 endpoints fail with `ConnectRefused`. | Operation fails with consolidated transport error; firewall hint included; per-attempt diagnostics list every endpoint attempted. | +| U4 | All G2 endpoints fail with mixed connectivity classes (`DnsFailure` + `TlsHandshakeFailure`). | Operation fails with consolidated transport error; firewall hint included (all classes are hint-eligible). | +| U5 | All G2 endpoints fail with HTTP 503 (`HttpResponse`). | Operation fails per existing 503-exhausted policy; **no** firewall hint (the path was open). | +| U6 | Mixed: westus2 G2 fails with `ConnectRefused`, westus3 G2 returns HTTP 503. | Operation fails per existing 503 policy; **no** firewall hint (the proof of an open path on westus3 rejects the firewall hypothesis). | +| U7 | All G2 endpoints fail with `MidStreamFailure` (GOAWAY mid-response). | Operation fails per existing transport-error policy; **no** firewall hint (mid-stream proves the path was open). | +| U8 | All G2 endpoints fail with `PostSendTimeout`. | Operation fails per existing timeout policy; **no** firewall hint (request reached the wire). | +| U9 | `options.gateway20_disabled = true`. | No G2 attempts are scheduled; operation routes via G1 from the start; diagnostics report `gateway20_decision_reason = OperatorDisabled`. | +| U10 | G2-ineligible operation type, account advertises G2 endpoints. | Operation routes via G1; diagnostics report `gateway20_decision_reason = OperationIneligible`; no firewall hint regardless of G1 outcome. | +| U11 | Account does not advertise G2 endpoints. | Operation routes via G1; diagnostics report `gateway20_decision_reason = NoGateway20Endpoints`; no firewall hint. | +| U12 | After a G2 transport failure on westus2, the operator sets `gateway20_disabled = true` and issues a G1 operation against westus2. | G1 operation routes to westus2 successfully — proves G2 transport failures do not poison the G1 unavailable-endpoint cache (validates the cache-isolation fix). | +| U13 | Single-endpoint account; sole G2 endpoint fails with `ConnectRefused`. | Operation fails with consolidated transport error; firewall hint included (single connectivity failure is sufficient when there is only one region to try). | ##### Fault-injection tests — emulator (`tests/emulator_tests/cosmos_fault_injection.rs`) @@ -340,15 +306,13 @@ Inject the listed failures via the existing fault-injection rules (extended in P | # | Injected fault | Expected result | | --- | --- | --- | -| F1 | Inject `connect_refused` on every G2 endpoint, enough times to meet the trip threshold. | First operations fail with G2 transport error; the operation issued after the trip succeeds via G1; diagnostics on the post-trip attempt show `gateway20_decision_reason = BreakerOpen` and `breaker.state = "open"`. | -| F2 | Inject TLS handshake failure on every G2 endpoint. | Same as F1; classifier MUST produce `TlsHandshakeFailure`. | -| F3 | Inject HTTP/2 negotiation failure (server returns HTTP/1.x bytes). | Same as F1; classifier MUST produce `Http2NegotiationFailure`. | -| F4 | Inject HTTP 503 (server-side, not transport-side) on every G2 endpoint. | Operations retry per existing 503 policy and may eventually fail; breaker MUST stay `Closed` (the network is fine); diagnostics show `gateway20_decision_reason = Eligible`. | -| F5 | Inject `connect_refused` on G2; after `C` elapses, stop injecting. | Probe operation succeeds via G2; subsequent operations route via G2; diagnostics show breaker recovered to `Closed`. | -| F6 | Inject `connect_refused` on G2 *and* `service_unavailable` on G1. | Operations fail (no transport works); breaker is `Open`; diagnostics make both failure causes legible. | -| F7 | Inject `connect_refused` on G2 westus2 only; verify subsequent G1 routing to westus2 succeeds (does not cross-region failover). | Once the breaker is open, a G1-routed op against westus2 MUST succeed — proving G2 transport failures did not poison the G1 unavailable-endpoint cache. | -| F8 | Trip the breaker, then refresh the account metadata mid-cooldown to add/remove G2 endpoints. | Breaker stays `Open` until cooldown regardless of topology changes; once cooldown elapses, the probe runs against the current topology (not the trip-time topology). | -| F9 | Trip the breaker while regional failover state is already active for an unrelated reason. | Breaker-forced G1 routing respects the existing regional failover decision (does not undo it); diagnostics surface both states. | +| F1 | Inject `connect_refused` on every G2 endpoint. | Operation fails with consolidated transport error; firewall hint present in error message; diagnostics show per-attempt `TransportFailureClass = ConnectRefused`. | +| F2 | Inject TLS handshake failure on every G2 endpoint. | Same as F1; classifier produces `TlsHandshakeFailure`; firewall hint present. | +| F3 | Inject HTTP/2 negotiation failure (server returns HTTP/1.x bytes) on every G2 endpoint. | Same as F1; classifier produces `Http2NegotiationFailure`; firewall hint present. | +| F4 | Inject HTTP 503 on every G2 endpoint. | Operation fails with standard 503-exhausted error; **no** firewall hint; diagnostics show `TransportFailureClass = HttpResponse`. | +| F5 | Inject `connect_refused` on G2 westus2 only (account has multiple regions). | Operation succeeds via G2 westus3 (existing regional retry); no firewall hint. | +| F6 | Inject `connect_refused` on G2 westus2 + HTTP 503 on G2 westus3. | Operation fails with standard 503-exhausted error; **no** firewall hint (mixed outcomes reject the firewall hypothesis). | +| F7 | After F1 fails, set `options.gateway20_disabled = true` on the same client and retry the workload. | Operations succeed via G1 against the same per-region URLs that were marked unavailable for G2 — validates the cache-isolation fix end-to-end. | ##### Live tests — Gateway 2.0 live pipeline (`tests/emulator_tests/gateway20_e2e.rs`) @@ -356,16 +320,16 @@ The live job already authenticates against a dedicated Gateway 2.0-enabled accou | # | Scenario | Expected result | | --- | --- | --- | -| L1 | Normal operation against the Gateway 2.0 account. | Diagnostics show `gateway20_decision_reason = Eligible`, `breaker.state = "closed"`. | -| L2 | Set `options.gateway20_disabled = true`, run the same workload. | Diagnostics show `gateway20_decision_reason = OperatorDisabled`, breaker remains inert (`recent_failures = 0`). | +| L1 | Normal operation against the Gateway 2.0 account. | Diagnostics show `gateway20_decision_reason = Eligible`. | +| L2 | Set `options.gateway20_disabled = true`, run the same workload. | Diagnostics show `gateway20_decision_reason = OperatorDisabled`. | -A "firewall in CI" test (force outbound block to G2 hostnames) is **not** added to the standard live job because it requires container-network manipulation that the SDK live pipeline is not currently set up to perform. Java has the same gap (its `Http2ConnectionLifecycleTests` is in the explicitly-excluded `manual-thinclient-network-delay` group). Track as a separate manual test ticket; do not block the breaker landing on it. +A "firewall in CI" test (force outbound block to G2 hostnames) is **not** added to the standard live job because it requires container-network manipulation that the SDK live pipeline is not currently set up to perform. Java has the same gap (its `Http2ConnectionLifecycleTests` lives in the explicitly-excluded `manual-thinclient-network-delay` group). Track as a separate manual test ticket; the fault-injection coverage above demonstrates the contract. #### 4.3.5 Open sub-questions for this section -- **Q4a** — Should the breaker fire the first probe **automatically** in the background after `C` instead of waiting for the next user op? (Decision: no, as argued above. Captured here in case the team revisits.) -- **Q4b** — Should the trip threshold `N` and window `W` be tunable via env var? (Default position: no — these are SDK-level safety parameters and customers should not tune them in production. Re-open if the live bake-in shows the defaults are wrong for a specific account topology.) -- **Q4c** — Should the breaker be allowed to *escalate* (e.g. mark the entire account as G2-incompatible in the account-cache so a fresh client picks it up)? (Decision: no, scope creep. Each client makes its own observation.) +- **Q4a — Revisit Option B (auto-fallback)?** Default position: only with telemetry from the live bake-in showing widespread firewall trouble that the fail-fast hint does not mitigate. Option B is API-additive — it can ship later as a connection-pool option (e.g. `gateway20_auto_fallback: bool`) defaulting to `false` so existing fail-fast semantics remain the default. +- **Q4b — Should the firewall hint mention the specific port (`:444`)?** Default position: no — the test fixture uses `:444` but production may use a different port; the hint stays abstract ("Gateway 2.0 hostnames or ports") and the per-attempt diagnostics carry the exact endpoint URLs. +- **Q4c — Should `gateway20_disabled` accept a structured reason for diagnostic logging instead of a `bool`?** Default position: no, scope creep. Operator-set settings are settings, not telemetry. If we want to record _why_ the operator opted out, that belongs in customer-side logs. --- @@ -846,4 +810,4 @@ A **new dedicated CI pipeline** is required for gateway 2.0 live tests. Gateway - **Q1 — HTTP/2 prior knowledge vs ALPN**: _Resolved_. Gateway 2.0 always uses HTTP/2; the proxy does not accept HTTP/1.x. Rust uses HTTP/2 with prior knowledge on the Gateway 2.0 transport (no ALPN fallback to HTTP/1.x). The broader ALPN default in `TRANSPORT_PIPELINE_SPEC.md` does **not** apply to Gateway 2.0; if HTTP/2 negotiation fails, the request fails and the existing retry policies handle it. - **Q2 — Live test account provisioning**: Cosmos DB account configuration flags required to enable Gateway 2.0 endpoints are not part of the standard Bicep templates. _Resolution_: hardcode a dedicated, pre-provisioned Gateway 2.0 account for the gateway 2.0 live tests pipeline and reuse it across runs (rather than provisioning per-run via Bicep). Account name and credentials stored in pipeline secrets (`AZURE_COSMOS_GW20_ENDPOINT`, `AZURE_COSMOS_GW20_KEY`); pipeline reads endpoint from environment variables. - **Q3 — EPK range header names**: _Resolved_. The Gateway 2.0 proxy requires the Java header names `x-ms-thinclient-range-min` / `x-ms-thinclient-range-max`. Phase 2 introduces new constants (`GATEWAY20_RANGE_MIN`, `GATEWAY20_RANGE_MAX`) on the Gateway 2.0 path; the existing `START_EPK` / `END_EPK` (`x-ms-start-epk` / `x-ms-end-epk`) constants remain for any non-Gateway-2.0 callers but are **not** emitted on Gateway 2.0 requests. -- **Q4 — Connectivity-failure G2 → G1 fallback**: _Specified, not yet implemented_. See §4.3 for the full design and test plan. The breaker is **not** part of the initial Gateway 2.0 enablement; it is a follow-up that must land before we can claim "Gateway 2.0 default-on is safe for customers behind firewalls". Open sub-questions for tuning live in §4.3.5. +- **Q4 — Connectivity-failure handling (G2 → G1)**: _Specified, not yet implemented_. See §4.3 for the full design and test plan. Recommendation is **fail-fast** (parity with Java/.NET): when all G2 endpoints fail with connectivity-class errors, surface a consolidated error pointing the operator at `options.gateway20_disabled = true`. Auto-fallback (a client-scoped circuit breaker) is parked under §4.3.5 Q4a as a future API-additive option that ships only with telemetry justifying it. The fail-fast contract — and the structured `TransportFailureClass` classifier and the G1/G2 endpoint-cache isolation it depends on — must land before we can claim "Gateway 2.0 default-on is safe for customers behind firewalls". From 449ba4c973bf455b69041d0afcc9c7f4322c6b98 Mon Sep 17 00:00:00 2001 From: tvaron3 Date: Mon, 11 May 2026 16:24:42 -0700 Subject: [PATCH 48/48] Fix cspell error: rephrase 'travelled' in transport_kind docs The Build Analyze stage flagged 'travelled' (UK spelling) in the FaultInjectionCondition::transport_kind doc comment. Rephrase the sentence to use neutral wording rather than carry a UK-spelling exception in the cosmos cspell dictionary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../src/fault_injection/condition.rs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs index 1ecfc3b3eaa..6e3f02f7799 100644 --- a/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs +++ b/sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/condition.rs @@ -35,9 +35,9 @@ impl FaultInjectionCondition { /// Returns the transport kind to which the fault injection applies. /// - /// When `Some`, the rule only matches requests that travelled through the - /// specified transport (e.g. `TransportKind::Gateway20`). When `None`, the - /// rule matches every transport (including metadata, gateway, and Gateway 2.0). + /// When `Some`, the rule only matches requests sent over the specified + /// transport (e.g. `TransportKind::Gateway20`). When `None`, the rule + /// matches every transport (including metadata, gateway, and Gateway 2.0). pub fn transport_kind(&self) -> Option { self.transport_kind }