Skip to content

Add Gateway 2.0 implementation to Cosmos driver#4319

Draft
tvaron3 wants to merge 46 commits intoAzure:release/azure_data_cosmos-previewsfrom
tvaron3:tvaron3/gateway-2.0-impl
Draft

Add Gateway 2.0 implementation to Cosmos driver#4319
tvaron3 wants to merge 46 commits intoAzure:release/azure_data_cosmos-previewsfrom
tvaron3:tvaron3/gateway-2.0-impl

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 30, 2026

Implements Cosmos DB Gateway 2.0 (a.k.a. "thin client") in the Rust driver and SDK, per the design spec merged in #4223.

This is a multi-slice implementation that adds:

  • The RNTBD-over-HTTP/2 wire format (Slice 1)
  • The driver-side foundation — request types, capabilities header, account-properties probe surface (Slice 2)
  • Routing eligibility & EPK derivation (Slice 3a)
  • The end-to-end RNTBD dispatch path with is_gateway20_allowed pool flag, fallback to standard gateway, and shared dispatch state (Slice 3b/c)
  • Phase 6 testing & infrastructure: dedicated CI pipeline + matrix, capturing-transport pipeline tests, fault-injection contract locks, and E2E scaffolding

Status: DRAFT — do not review. Several follow-ups are required before this is mergeable. They are listed at the end of this description.

Commits

Title Notes
8363209 Add Gateway 2.0 RNTBD wire format (Slice 1) RNTBD frame encode/decode, header/token enums, capabilities bitmask 9
802e479 Add Gateway 2.0 foundation (Slice 2) TransportKind::Gateway20, account-properties probe wiring, fallibility plumbing
c475d87 Add Gateway 2.0 routing eligibility and endpoint key derivation (Slice 3a) Gates which operations route to GW2.0; derives the EPK used at dispatch time
6963626 Add Gateway 2.0 RNTBD dispatch (Slice 3b/c) Dispatch path; pool-level is_gateway20_allowed flag; standard-gateway fallback
56890a7 Use AcqRel ordering for Gateway 2.0 transport request id Deep-review nit; counter would be correct under Relaxed but AcqRel improves diagnostic-trace clarity
27218fb Add Gateway 2.0 Phase 6 testing & infrastructure CI yaml, matrix, integration tests, fault scenarios, E2E stubs (see below)

Phase 6: testing & infrastructure

CI

  • sdk/cosmos/ci-gateway20.yml — dedicated PR-and-manual live-test pipeline. Reads AZURE_COSMOS_GW20_ENDPOINT / AZURE_COSMOS_GW20_KEY from pipeline secrets (the spec's Q2 — the GW2.0 account is pre-provisioned out-of-band; standard ARM/Bicep cannot create one).
  • sdk/cosmos/live-gateway20-matrix.jsongateway20 (single region) + gateway20_multi_region test categories.

Driver pipeline tests (gateway20_pipeline_tests.rs, gated #[cfg(feature = "__internal_mocking")])

A CapturingTransport + CapturingFactory pair that refuses every send and records the outgoing request. Active tests:

  • Operator override at the pool level disables GW2.0 dispatch
  • V1 dual-consistency-header invariant (regression guard — current V1 path emits no consistency header)
  • V2 RNTBD dual-token contract lock (asserts the observable HTTP-boundary contract)
  • Capabilities header pinned to "9"

Three live-account stubs (#[ignore]) lock in test names for stored-proc fallback, diagnostics validation, and SDK-boundary operator override.

Fault injection (additive)

  • driver_fault_injection.rs — three new emulator-gated tests: 503 → regional failover, 408 → cross-region for reads, 404/1002 → remote-preferred without PKRange refresh. Each carries a TODO(Phase 6) describing how to scope to the GW2.0 transport once FaultInjectionCondition::with_transport_kind lands.
  • cosmos_fault_injection.rs — one new GW2.0 ConnectionError → standard-gateway fallback contract lock.

SDK E2E (gateway20_e2e.rs)

Six placeholder tests gated on test_category = "gateway20" plus the two env vars. Bodies are intentionally TODO(Phase 6) until CosmosClientOptions exposes a public Gateway 2.0 toggle. The names lock in the contract: point CRUD, query streaming, transactional batch, change-feed (latest version), diagnostics validation, SDK-boundary operator override.

Verification

cargo fmt --check                                    # clean
cargo clippy --all-features --all-targets -D warnings # clean
cargo build  --all-features --tests                  # clean
cargo test   -p azure_data_cosmos_driver --all-features  # 744 unit + 4 new pipeline (3 ignored stubs)
cargo test   -p azure_data_cosmos        --all-features  # 463 unit + new e2e/fault stubs (ignored)

Two deep-review cycles were run against the stack — one on the impl Slices 1–3b/c (one Tier 2 finding, addressed in 56890a7) and one on the Phase 6 commit (one Tier 2 consistency suggestion that was not actionable under repo conventions).

Documented follow-ups (not blocking design review of this stack, but BLOCK merge)

  • Slice 3d: EPK cutover for queries / read-feeds (R5 in the spec).
  • HPK partial-PK paths: cross-cutting; not addressed in any slice yet.
  • Continuation token format: Gateway 2.0 may emit a different format; not yet handled.
  • SDK-level Gateway 2.0 enable API: CosmosClientOptions::with_gateway20_disabled does not exist, which is why the six E2E tests are scaffolds.
  • FaultInjectionCondition::with_transport_kind: missing today; the four new fault tests cannot scope themselves to the Gateway 2.0 transport and run on whichever transport is selected at dispatch.
  • is_gateway20_allowedgateway20_disabled rename (R15): pool option pre-dates the spec's "negative-term flags only" rule.
  • Spec text fix (separate spec PR): ConsistencyLevel token id is 0x0010, not 0x00F0.
  • Spec text (separate spec PR): explain why Rust uses capabilities bitmask 9 while Java uses 11.

Implementation reference: #4223 (spec). See sdk/cosmos/azure_data_cosmos_driver/docs/GATEWAY_20_SPEC.md for the full design.

tvaron3 and others added 30 commits April 20, 2026 11:29
Add design specification for Gateway 2.0 (formerly thin client) support
in the Rust Cosmos driver and SDK. The spec covers motivation,
current Rust state, a phased implementation plan (RNTBD protocol,
request pipeline, endpoint discovery, retry/errors, SDK integration,
and a dedicated live tests pipeline), and remaining open questions.

Gateway 2.0 is auto-detected from account metadata and is not exposed
as a customer-facing configuration option.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses all 21 findings from the PR Deep Reviewer run on Azure#4223:

- Fix EPK attribution: canonical path is the driver's
  EffectivePartitionKey::compute/compute_range. Flag the SDK-side
  get_hashed_partition_key_string as known-broken for MultiHash
  and must not be wired to Gateway 2.0 header injection.
- Resolve stored-procedure contradiction: follow Java (no SP via
  thin client); explicitly reject .NET's ExecuteJavaScript allowance.
- Keep RNTBD field widths as uint16 LE, cite Java
  RntbdRequestFrame.encode's writeShortLE. Reviewer's uint32 claim
  was incorrect for thin client.
- Add single-source-of-truth gating model (section 3.4) with invariants.
- Add fallback taxonomy (eligibility vs failure), read/write region
  pairing, PLF precedence, env var reframed as unsupported override.
- Standardize on is_operation_supported_by_gateway20().
- Add Status/Date/Authors header, numbered TOC, Q1 through Q4 open
  questions, Related Specs cross-links, header-name wire-to-constant
  table, range-header wire format notes, wire-format ambiguity
  resolutions (length-inclusive, UUID byte order, payload-presence).
- Expand test matrix with error-case EPK, StoredProc-rejected
  assertion, Bulk vs Batch, failure-fallback unwind, PLF precedence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses three findings from the PR Deep Reviewer second pass:

- F-A: replace the non-existent epk_length_aware_cmp citation with
  EffectivePartitionKey's Ord/cmp impl, cite the actual epk_cmp_*
  tests in container_routing_map.rs and the binary_search_by
  consumer site. Point PR Azure#4087 at the correct claim.
- F-B: fix the numerically wrong UUID worked example. The previous
  example for 12345678-1234-5678-1234-567812345678 wrote MSB bytes
  78 56 34 12 34 12 78 56, conflating writeLongLE with byte-reversal
  of the hyphen groups. Replace with 0a1b2c3d-4e5f-6789-abcd-
  ef0123456789 so MSB and LSB give visually distinct LE sequences.
- F-C: add a "Proxy unreachable definition" subsection enumerating
  transport-level (TCP refuse/timeout, TLS handshake, HTTP/2 GOAWAY,
  reqwest::Error connect/timeout/request before any status) and
  HTTP-infrastructure classes (502, 504, 503-without-Cosmos-
  substatus). Explicitly exclude responses carrying a Cosmos
  sub-status. Defer to TRANSPORT_PIPELINE_SPEC for broader
  classification. Cross-reference from the Retry Decision Table.

Also add a "Java parity" subsection to Phase 4 documenting that
ThinClientStoreModel extends RxGatewayStoreModel, that none of the
Java retry policies have thin-client-specific code, and that the
Rust failure-fallback counter is more thin-client-aware than Java's.
Flag a Java behavioral nuance worth NOT replicating: Java marks the
gateway endpoint (not the thin-client endpoint) unavailable on a
thin-client 503.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per follow-up review, remove the Phase 4 "Java parity" subsection.
The cross-SDK observations belong in design discussion, not in the
shipping design spec.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Q1 (HTTP/2 vs ALPN): resolved. Gateway 2.0 is HTTP/2-only with
  prior knowledge; the proxy does not accept HTTP/1.x. Negotiation
  failure feeds the failure-fallback counter rather than downgrading.
- Q3 (EPK range header names): resolved. Proxy requires the Java
  names x-ms-thinclient-range-min / -max. Phase 2 introduces new
  THINCLIENT_RANGE_MIN / _MAX constants; START_EPK / END_EPK are
  not emitted on the Gateway 2.0 path.
- Q4 (failure-fallback thresholds): clarified initial values
  (N=3 in 30s sliding window, 60s cooldown) and noted the live
  test pipeline is the tuning surface; thresholds are internal,
  not customer-tunable.

Updates Phase 2 header naming table and §3.1 / §3.3 references
accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Existing retry policies (ClientRetryPolicy and friends) already
cover the rows in that table; the spec was duplicating
cross-cutting behavior. Updated the surrounding "not Proxy
unreachable" bullets to reference the existing retry path
instead of the removed table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Java's thin client has no equivalent mechanism: ThinClientStoreModel
extends RxGatewayStoreModel, model selection is per-request and
stateless, and the existing ClientRetryPolicy / WebExceptionRetryPolicy
chain handles transport errors, 502/503/504, and regional
unavailability uniformly across both transport modes. Rust takes the
same posture - no per-partition counter, no sticky standard-gateway
state, no cooldown timer, no Proxy-unreachable classification, no
new gateway20_retry.rs state machine.

Removed:
- Failure fallback row from the Phase 4 fallback taxonomy
- Proxy unreachable definition subsection
- Failure-fallback references in Phase 4 retry list, files-changed,
  and test matrix
- Open Question Q4 (thresholds) - no longer applicable
- Failure-fallback counter mentions in §3.1 and Q1

The single remaining fallback is the per-request Eligibility fallback
(operation not supported by Gateway 2.0 -> standard gateway), which
is unrelated to failure handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RNTBD is "Real Name To Be Determined" - a placeholder name that
stuck, not "Reliable Network Transfer Binary Data" (a backronym
that LLM analysis tends to invent).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for the PR analyze stage:

- cspell: add inline `cspell:ignore` directive for spec-specific
  jargon (THINCLIENT, thinclient, Mgmt, cutover, directconnectivity,
  footgun, cooldown, ALPN). Scoped to this file rather than the
  global word list since these are spec-only terms.
- Link verification: convert relative sibling-spec links
  (TRANSPORT_PIPELINE_SPEC.md, PARTITION_KEY_RANGE_CACHE_SPEC.md,
  PARTITION_LEVEL_FAILOVER_SPEC.md) to absolute GitHub URLs as
  required by the Verify-Links guideline (sdk/ paths are not in
  allow-relative-links.txt).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Timeout policy: note that any Gateway 2.0-specific timeout tuning
  is deferred to a follow-up.
- PLF interaction: PLF picks the region; within that region Gateway
  2.0 is preferred whenever a gateway20_url is available, otherwise
  the request falls back to standard gateway. Removes the previous
  "PLF wins" framing that implied PLF always defeats Gateway 2.0.
- Drop the 408/503/410 retry-behavior bullets. The section already
  states retry policies are identical to standard gateway, so
  re-listing them risked drift from the canonical policy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply Kevin's unresolved review feedback on PR Azure#4223:

- Reword §1 Overview to say "RNTBD binary serialization over the
  HTTP/2 protocol" (clearer separation of serialization vs. transport).
- Soften the SLA latency bullet in §2 Key Benefits to "plans to provide
  contractual latency commitments" — we have not updated contractual
  terms yet, so avoid overpromising ahead of broad-usage measurement.
- Add a cross-partition query aggregation bullet under §2 Design
  Philosophy → SDK Responsibility, noting it stays client-side under
  Gateway 2.0 (no server-side aggregation).
- Fix the Protocol row in the Connection Mode Comparison table:
  HTTP/REST → REST/HTTP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Updates the Gateway 2.0 design spec in response to PR Azure#4223 review feedback
from analogrelay and FabianMeiswinkel.

- Routing ownership: clarify SDK keeps regional + partition-level routing;
  only replica-level routing within a partition moves to the proxy.
- API correctness: replace the fictional 'CosmosClient::create_item(T)'
  walkthrough with the real 'ContainerClient::create_item(partition_key,
  item, options)' signature in sections 4.1 and 4.2.
- SDK <-> driver boundary: mark section 3.6-10 resolved. Gateway-2.0
  constants ('THINCLIENT_PROXY_*', range headers, etc.) live exclusively
  in 'azure_data_cosmos_driver::constants' with no SDK re-export. The SDK
  invokes the generic 'CosmosDriver::execute_operation' interface and the
  driver decides Gateway 2.0 vs standard gateway internally.
- SPROC: drop the .NET-vs-Java framing. Stored-procedure execution is out
  of scope for Rust SDK GA; eligibility fallback routes any incoming SPROC
  request to the standard gateway.
- New gap (3.6-11): pre-Phase-2 audit deliverable to enumerate every
  EPK/PartitionKeyRange-shaped struct across both crates and consolidate
  on a single canonical type.
- Operator override: restore 'CosmosClientOptions::gateway20_disabled'
  (default false) as the single supported public kill-switch. No env var
  (intentional discouragement of casual / fleet-wide enablement). Carries
  an explicit warning that flipping it voids Gateway 2.0's latency SLA
  and impacts 24/7 Microsoft support eligibility for performance
  regressions.
- New gap (3.6-12): retry behavior for 449 (Retry-With: same endpoint,
  standard backoff, no region switch) and 404 / sub-status 1002
  (PARTITION_KEY_RANGE_GONE: refresh PKRange cache, prefer remote region;
  PLF region wins when PLF has pinned the PKRangeId). Phase 6 test matrix
  expanded with both rows.
- HPK gating refinement (carried forward from round 4): only emit
  'x-ms-documentdb-partitionkey' alongside the EPK header(s) when the
  request carries the FULL partition key (point ops on any container, and
  full-key single-logical-partition queries on HPK containers). Prefix
  queries on HPK containers emit the EPK-range headers only.
- Prose scrub: 'thin client' / 'thin-client' rewritten to 'Gateway 2.0'
  in body prose. Wire-header literals ('x-ms-thinclient-*',
  'thinClientReadableLocations'), Rust API symbol names
  ('has_thin_client_endpoints'), and .NET / Java symbol references
  retained verbatim.
- cspell pass clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 'footgun' with 'pitfall' in two prose locations (the Java
header-mutation hazard heading and the Range semantics blockquote)
and drop 'footgun' from the inline cspell:ignore directive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop two mentions of internal ADO PR #2031635 from the
ReadConsistencyStrategy section. Java PR #48787 and .NET PR #5685
remain as the public cross-SDK references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert open-question item 11 from a deferred audit deliverable into
a concrete decision. Audit results documented inline:

- Driver crate is canonical: EpkRange<T>, PartitionKeyRange (typed
  EffectivePartitionKey bounds), and the EffectivePartitionKey newtype
  with compute_range().
- SDK-crate analogs (routing::range::Range, routing::partition_key_range,
  hash::EffectivePartitionKey) are NOT used on the Gateway 2.0 path and
  remain only for legacy non-Gateway-2.0 SDK callers.

Phase 2 EPK header injection MUST reuse the driver-crate types and MUST
NOT introduce a new EPK-range struct or depend on any SDK-crate analog.
Consistent with item 10 (no SDK Gateway-2.0 surface).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop the BaseProxyClientHttpMessageHandler attribution paragraph and
the HPK full-key gating bullet section (single-component vs
hierarchical, full-key vs prefix-key emission rules), and remove the
matching cross-references downstream:

- Header table row for x-ms-documentdb-partitionkey: simplified to
  describe co-emission with x-ms-effective-partition-key on point /
  single-logical-partition ops, without the prefix-key gating clause.
- Header injection flow step 4: trimmed to just emit the EPK header;
  PK-header gating clause removed.
- Test matrix: dropped the 'HPK PK+EPK pairing (full-key gating)' row.

The spec no longer prescribes any HPK-specific PK-header gating; that
behavior can be re-introduced in a follow-up if needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the two Gateway-2.0-specific HTTP headers
(x-ms-thinclient-account-name, x-ms-thinclient-regional-account-name)
with the existing RNTBD GlobalDatabaseAccountName token (0x00CE,
String, optional), carried in the RNTBD metadata stream on every
Gateway 2.0 request.

- Headers table: dropped both -account-name and -regional-account-name
  rows.
- Prose: replaced the proxy-tenant-routing section with a brief
  Tenant-identification note pointing to the RNTBD token.
- Injection flow step 3: now serializes the RNTBD token instead of
  setting two HTTP headers.
- Test matrix: row reframed to assert presence of the RNTBD token in
  the request metadata stream.

This drops the regional-account-name carrier entirely; the proxy uses
the global account-name only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename THINCLIENT_PROXY_* Rust constants to GATEWAY20_* family (wire
  header strings remain x-ms-thinclient-* server-defined)
- 449 retry policy now explicitly exponential backoff (separate budget
  from 410/Gone)
- Strengthen positive-term ban: forbid is_gateway20_allowed /
  gateway20_allowed / enable_gateway20 anywhere in driver, SDK, perf
  crate, env vars, or test wiring; only negative-term names permitted
- Restore (§3.5) reference in EPK-computation bullet per reviewer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop @analogrelay attribution and the Kiran context pointer; replace
with neutral phrasing. Also remove the names from the cspell:ignore
directive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per the negative-term naming rule (default values mean Gateway 2.0
enabled), rename the per-request gating flag and invert its logic.
Update §3.1 formula, the Phase 1 request-flow step, and inline
references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove firewall-rules bullet and table row (thread Azure#4 — port-based
  firewall framing applies to all transports, not Gateway 2.0 specifically).
- Remove HTTP/2 multiplexing bullet and tighten protocol cell (thread Azure#5 —
  multiplexing is a transport feature shared with Gateway V1, not a
  Gateway-2.0-specific benefit).
- Reword §3 routing-decision scope: 'once per request' -> 'once per logical
  operation, inherited by retries and sub-requests' (thread Azure#6) — prevents
  mid-operation transport-mode flips that would fragment diagnostics and
  break session-token affinity.
- §4.1 449 retry: introduce a new ThrottleAction::RetryWith { delay,
  new_state: RetryWithState } variant in driver/pipeline/components.rs and
  extend decide_throttle_action in transport_pipeline.rs (thread Azure#7),
  guaranteeing structurally that the 449 budget is independent of the
  410/Gone and 429 budgets.
- §4.2 sub-status correction (thread Azure#8): the parent status is 404, and
  the sub-status name is READ_SESSION_NOT_AVAILABLE (not
  PARTITION_KEY_RANGE_GONE — that name belongs to 410/1002). Body rewritten
  to reflect the session-token-stale semantics: existing 404 retry path
  applies; the only Gateway-2.0-specific deviation is that we do NOT
  refresh the PKRange cache on 404/1002. Test-matrix row updated.
- Test coverage: add HPK + Gateway 2.0 row exercising full vs partial PK
  forms (full -> x-ms-effective-partition-key, partial -> x-ms-thinclient-
  range-min / -max via EffectivePartitionKey::compute_range()).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the 'follow existing 404 path' body in §4.2 with the prior
'prefer remote region + PLF precedence' wording. Update the matching
test-matrix row to assert prefer-remote routing, PLF override, and the
no-PKRange-refresh invariant on 404/1002.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces the rntbd module: request frame serializer, response frame
deserializer, token codecs (Byte/UShort/ULong/Long/ULongLong/LongLong/Guid/
SmallString/String/ULongString/SmallBytes/Bytes/ULongBytes/Float/Double),
and HTTP-status mapping with optional sub-status enrichment.

Activity-Id is encoded as the 16-byte [u64 LE msb][u64 LE lsb] pair on the
wire (matching Java/.NET RNTBD), distinct from the metadata Guid token
encoding (MS GUID fields LE via Uuid::as_fields).

The capabilities header advertises bitmask "9" (PartitionMerge |
IgnoreUnknownRntbdTokens). Unlike Java which advertises "11" (also
ChangeFeedWithStartTimeFromBeginning), Rust intentionally skips that bit
in Slice 1 because the underlying behavior is not yet implemented \u2014
advertising an unimplemented capability would violate the contract.

Per AGENTS.md ("Prefer Result::Err over panicking"), serialize() and the
underlying token write_to / write_len_prefixed_* helpers all return
azure_core::Result so oversized inputs (frame > u32, SmallString > 255,
String > 64 KB) surface as data-conversion errors instead of panicking.

The module is gated by #[allow(dead_code, unused_imports)] and is not yet
wired into the transport pipeline; Slice 2 will add the dispatcher and
operation-eligibility filter.

Implements R1, R2, R3 and AC1-AC4 from .coding-harness/spec.json, mapping
to GATEWAY_20_SPEC.md \u00a75 Phase 1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Lays the foundation for Gateway 2.0 request handling without yet routing
any traffic through it. Three deliverables:

* **Eligibility helper** — pure `is_operation_supported_by_gateway20`
  that returns `true` only for Document × {Create, Read, Replace, Upsert,
  Delete, Query, SqlQuery, QueryPlan, ReadFeed, Batch}. Both the outer
  `ResourceType` and inner `OperationType` matches are exhaustive (no
  wildcard arms) so any new enum variant is a compile-time error,
  forcing an explicit eligibility decision rather than a silent
  fail-closed default.
* **Account-name extraction** — `AccountEndpoint::global_database_account_name`
  parses the host's first label and returns it for Cosmos endpoints
  (`*.documents.azure.{com,us,cn}`). Returns `None` for the emulator,
  IPv4/IPv6 literals, and custom domains; Slice 3 will read this when
  emitting the RNTBD `GlobalDatabaseAccountName` token.
* **Constants relocation** — new `azure_data_cosmos_driver::constants`
  owns the canonical `x-ms-thinclient-*` and `x-ms-effective-partition-key`
  wire strings under `GATEWAY20_*` identifiers. The SDK's
  `THINCLIENT_PROXY_OPERATION_TYPE` / `_RESOURCE_TYPE` are now
  `#[deprecated]` re-exports of the new driver constants, preserving
  public API while migrating consumers. `COSMOS_ALLOWED_HEADERS` is
  extended to keep logging behavior unchanged.

Helper and account-name accessor are intentionally `#[allow(dead_code)]`
in this slice — Slice 3 wires them into the dispatch path. No routing,
body-wrapping, header-injection, or response-unwrap changes here.

Tests: +5 unit tests (constants pinning + distinctness, eligibility
matrix exhaustiveness, stored-proc explicit ineligibility, host
extraction table). Total 721 lib tests (was 716), all passing.

Validation: cargo fmt, cargo clippy --all-features --all-targets
-- -D warnings, cargo doc --no-deps, and cargo test --all-features all
clean for both azure_data_cosmos and azure_data_cosmos_driver.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e 3a)

Wires the Slice 2 eligibility helper into operation routing and fixes a
latent connection-pool keying bug for Gateway 2.0 endpoints.

Changes:

- pipeline/components.rs: add endpoint_key field to RoutingDecision so the
  pool key is captured alongside the chosen URL rather than being recomputed
  from the underlying CosmosEndpoint cache.
- pipeline/operation_pipeline.rs:
  - resolve_endpoint now considers Gateway 2.0 only when the endpoint
    advertises it AND the account name is parseable AND the operation is
    supported by Gateway 2.0 per is_operation_supported_by_gateway20.
  - Selected URL's authority is used to derive endpoint_key when routing
    through Gateway 2.0; otherwise the existing endpoint cache key is reused.
  - account_endpoint.global_database_account_name().is_some() is inlined at
    the call site; full Option<String> threading is deferred to Slice 3b/c.
- transport/mod.rs: re-export is_operation_supported_by_gateway20 for
  pipeline consumers.

Tests:

- Three new resolve_endpoint tests cover (1) ineligible operation falling
  back to Gateway, (2) missing account name falling back to Gateway, and
  (3) Gateway 2.0 routing producing an endpoint_key derived from the
  Gateway 2.0 authority (not the gateway1 cache).
- Existing tests updated for the new resolve_endpoint signature and the
  new RoutingDecision field.

Validation: cargo fmt, cargo clippy --all-features --all-targets -D warnings,
cargo test -p azure_data_cosmos_driver --all-features --lib (724 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final slice of Gateway 2.0 vertical-slice plan §5: wrap a signed Cosmos
HTTP request as an RNTBD request frame on the way out, and decode the
proxy's RNTBD response frame back into a synthetic `HttpResponse` on the
way in. With Slice 3a's routing eligibility gating, point operations on
fully-specified partition keys now flow end-to-end over Gateway 2.0 when
the routing decision selects it.

The wrap helper consumes a signed `HttpRequest`, reuses the exact
`x-ms-date` and `Authorization` values written by `sign_request` (so
signature verification still holds), packs the 11 RNTBD request tokens
required by the proxy (Authorization, PayloadPresent, Date,
ConsistencyLevel, DatabaseName, CollectionName, DocumentName,
TransportRequestId, EffectivePartitionKey, SDKSupportedCapabilities,
GlobalDatabaseAccountName), and returns a brand-new `HttpRequest` carrying
just the outer `User-Agent` and `x-ms-activity-id` headers. The unwrap
helper runs only on outer HTTP 200 — outer non-200 responses (proxy or
transport errors) pass through unchanged so they surface as their actual
status, not a synthesized `TRANSPORT_GENERATED_503`.

Consistency resolution lives in a new `resolve_effective_consistency`
helper that implements the precedence chain from spec §5.2 at
operation-pipeline scope, then flows through `TransportRequest` so the
wrap helper never inspects HTTP headers for consistency.

Failure modes:
- Wrap failure (missing signed header, malformed activity_id, bad
  resource link, missing account name) → `CLIENT_GENERATED_400`
  TransportError, `RequestSentStatus::NotSent`. Adds a new
  `SubStatusCode::CLIENT_GENERATED_400` (20400) mirroring the existing
  `CLIENT_GENERATED_401` pattern.
- Unwrap failure (outer-200 with undecodable RNTBD body, or inner status
  outside 100..=599) → `TRANSPORT_GENERATED_503`,
  `RequestSentStatus::Sent`.
- Outer non-200 → outer triple passes through unchanged, no unwrap.

Files:

- `transport/gateway20_dispatch.rs` (new): `WrapInputs`, `wrap_request_for_gateway20`, `unwrap_response_for_gateway20`, `parse_resource_names`, `effective_partition_key_bytes`, `next_transport_request_id` (process-wide AtomicU32), 13 unit tests.
- `transport/rntbd/tokens.rs`: new `RntbdRequestToken` enum with 11 verified IDs and `Token::*` named constructors. IDs cross-referenced against Java's `RntbdConstants.RntbdRequestHeader`.
- `transport/transport_pipeline.rs`: wrap call site between `sign_request` and `execute_http_attempt` (gated on `transport_mode == Gateway20`); unwrap call site inside `finalize_http_attempt`'s Response branch before `map_http_response_payload` (gated on `transport_mode == Gateway20` AND outer status == 200); new `gateway20_wrap_error_result`, `gateway20_unwrap_error_result`; `TransportPipelineContext` carries `account_name`; 6 new integration tests.
- `pipeline/components.rs`: `TransportRequest` carries `transport_mode`, `operation_type`, `partition_key`, `partition_key_definition`, `effective_consistency`.
- `pipeline/operation_pipeline.rs`: real `account_name: Option<String>` binding, `effective_consistency` resolved at op-pipeline scope, all five new fields populated in `build_transport_request`.
- `options/read_consistency.rs`: new `resolve_effective_consistency(strategy, account_default) -> DefaultConsistencyLevel` per spec §5.2, with 4×5 table test.
- `models/cosmos_status.rs`: `SubStatusCode::CLIENT_GENERATED_400` (20400) and `CosmosStatus::CLIENT_GENERATED_400`.
- `transport/cosmos_headers.rs`, `transport/mod.rs`, `options/mod.rs`: exports for the new helpers/constants.

20 new tests; 744 passed, 0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps `next_transport_request_id` from `Ordering::Relaxed` to
`Ordering::AcqRel`. While `Relaxed` already guarantees uniqueness on
`fetch_add` (the operation is atomic regardless of ordering), the
stronger ordering ensures the increment is globally visible in
diagnostic traces — preventing any apparent ID-collision confusion when
two concurrent requests are inspected in logs.

Found in deep-review pass over the Gateway 2.0 stack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the Phase 6 deliverables called out in
`docs/GATEWAY_20_SPEC.md` (lines ~500-594):

**CI infrastructure (NEW)**

- `sdk/cosmos/ci-gateway20.yml` — dedicated PR-and-manual-dispatch
  pipeline that runs Gateway 2.0 live tests against a pre-provisioned
  thin-client account. Reads `AZURE_COSMOS_GW20_ENDPOINT` /
  `AZURE_COSMOS_GW20_KEY` from pipeline secrets (per spec Q2).
- `sdk/cosmos/live-gateway20-matrix.json` — matrix with
  `gateway20` and `gateway20_multi_region` test categories.

**Driver pipeline tests (NEW)**

- `azure_data_cosmos_driver/tests/gateway20_pipeline_tests.rs` — 7
  integration tests gated on the `__internal_mocking` feature. Uses a
  capturing `HttpClientFactory` to inspect outgoing requests.
  Coverage: operator-override pool flag, V1 dual-consistency-header
  invariant, V2 dual-token contract lock, capabilities header pin
  (`x-ms-cosmos-sdk-supportedcapabilities = "9"`). Live-account
  companions for stored-proc fallback, diagnostics validation, and
  operator override are stubbed with TODO(Phase 6) markers — they
  require either a public SDK Gateway 2.0 toggle or per-request
  diagnostics surfaces that don't exist yet.

**Driver fault injection (EDIT)**

- `driver_fault_injection.rs` — adds 3 emulator-gated contract locks:
  503 → regional failover, 408 → cross-region for reads, 404/1002 →
  remote-preferred without PKRange refresh. Today the rules fire on
  whichever transport is selected at dispatch (the
  `FaultInjectionCondition` API does not yet support a per-transport-
  kind filter); each test carries a TODO(Phase 6) describing the
  tightening once that filter lands.

**SDK tests (NEW + EDIT)**

- `azure_data_cosmos/tests/emulator_tests/gateway20_e2e.rs` — 6
  scaffolded E2E stubs gated on `test_category = "gateway20"` plus
  the `AZURE_COSMOS_GW20_ENDPOINT/_KEY` env vars. Bodies are intentionally
  empty until `CosmosClientOptions` exposes a public Gateway 2.0
  toggle; the test names lock in the contract.
- `cosmos_fault_injection.rs` — adds a Gateway 2.0 ConnectionError
  fallback contract lock with a TODO(Phase 6) marker.
- `mod.rs` — registers the new `gateway20_e2e` module.
- `azure_data_cosmos/build.rs` — adds `gateway20` to the recognized
  `test_category` cfg values so the new e2e file compiles without
  `--cfg` warnings.

**Verification:** `cargo fmt`, `cargo clippy --all-features
--all-targets -- -D warnings`, `cargo build --all-features --tests`,
and `cargo test --all-features` all pass on both crates. Driver
maintains the 744-test baseline; SDK maintains the 463-test baseline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tvaron3 and others added 15 commits April 30, 2026 08:15
Adds a "Capability bit composition" subsection to GATEWAY_20_SPEC.md
under "SDK-supported-capabilities advertisement" that explains why the
Rust driver advertises bitmask `9` while Java advertises `11`.

The table breaks down each bit (PartitionMerge=1,
IgnoreUnknownRntbdTokens=8, plus an additional Java-only capability at
bit 1=2 left unnamed pending verification against Java source) and
explicitly states that the Rust driver only advertises capabilities it
implements end-to-end. Adding any new bit requires implementing the
behavior first, then incrementing `SUPPORTED_CAPABILITIES_BITS` in
`cosmos_headers.rs` and re-pinning the Phase 6 header-value test.

This addresses one of the documented follow-ups from PR Azure#4319.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per Gateway 2.0 spec §3, options that gate Gateway 2.0 must use
negative-term names so the boolean default of `false` corresponds to
the GA target state (Gateway 2.0 enabled). `is_gateway20_allowed`
predated that rule.

Changes:

* `ConnectionPoolOptions::is_gateway20_allowed` → `gateway20_disabled`
  (semantics inverted) — field, getter, builder field, builder setter.
* `ConnectionPoolOptionsBuilder::with_is_gateway20_allowed` →
  `with_gateway20_disabled` (semantics inverted).
* Removed the `AZURE_COSMOS_CONNECTION_POOL_IS_GATEWAY20_ALLOWED` env
  var lookup. Spec §3 forbids env-var toggles for Gateway 2.0; getter
  doc-comment now explains why.
* HTTP/2-disabled-forces-disabled rule preserved
  (`gateway20_requires_http2` test still verifies it).
* Pre-GA the default stays at `true` (disabled) with an explicit TODO
  to flip to `false` once Slice 3d (EPK cutover), HPK partial-PK, and
  continuation-token format are complete. Defaulting on while the
  codepath is incomplete would route customer traffic through an
  unfinished pipeline.
* All call sites in driver, transport tests, pipeline tests, the
  `gateway20_e2e` doc, and the perf binary updated. Booleans inverted
  at every `capturing_runtime(...)` and builder call.
* TRANSPORT_PIPELINE_SPEC.md prose reference renamed in lock-step.
* Perf binary's `gateway20_disabled` diagnostic field is hardcoded
  `true` for now with TODO to wire from the SDK option once that
  builder method lands (Phase A item 5).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These were re-exports of the canonical driver-level
`GATEWAY20_OPERATION_TYPE` / `GATEWAY20_RESOURCE_TYPE` constants,
marked `#[deprecated(since = "0.33.0")]`. The Gateway 2.0 work is
pre-GA and there are no released callers depending on the old
identifier — back-compat aliases are not needed and only add API
surface that we'd have to delete later.

Removes:

* `THINCLIENT_PROXY_OPERATION_TYPE`
* `THINCLIENT_PROXY_RESOURCE_TYPE`

The `COSMOS_ALLOWED_HEADERS` macro continues to reference the
canonical driver constants directly (unchanged), so the headers stay
on the logging allowlist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds an optional `TransportKind` filter to fault injection rules so
that callers can scope a rule to Gateway 1.x or Gateway 2.0 traffic
without affecting metadata clients or the other dataplane transport.

Driver:
* `FaultInjectionCondition` gains a `transport_kind: Option<TransportKind>`
  field plus a `with_transport_kind` builder method.
* `FaultInjectionEvaluation` gains a `TransportKindMismatch` variant
  emitted when a rule restricts to a transport that this client does
  not serve.
* `FaultClient` now carries the bound transport kind (set at
  construction by `FaultInjectingHttpClientFactory` from
  `HttpClientConfig`). Metadata clients have `transport_kind == None`
  and so never match a rule that requires a specific transport — this
  prevents a Gateway-2.0 rule from accidentally firing on account
  discovery or other metadata traffic.
* `HttpClientConfig` records the transport kind for each constructor:
  metadata = None, dataplane gateway = Some(Gateway), dataplane
  Gateway 2.0 = Some(Gateway20).
* Three new unit tests in `fault_injection::http_client::tests` cover
  the match / mismatch / metadata-client cases.

SDK:
* `FaultInjectionCondition` mirrors the new field and builder method.
* `fault_injection::TransportKind` is re-exported from the driver so
  SDK consumers do not need to depend on the driver crate.
* `driver_bridge` translates the SDK-side `transport_kind` through to
  the driver builder.

Tests:
* The three Gateway 2.0 fault-injection tests in
  `driver_fault_injection.rs` now scope their rules to
  `TransportKind::Gateway20` and are gated behind the `gateway20`
  test category — the rule semantics are now correct, and the tests
  no longer fire spuriously on emulator account-discovery traffic.
* The matching SDK-side test in `cosmos_fault_injection.rs` is
  similarly scoped and gated. Once the public SDK Gateway 2.0 toggle
  lands, the assertion will flip from 'rule never fires' to 'read
  succeeds via the standard-gateway fallback'.
* `azure_data_cosmos_driver/build.rs` declares the new `gateway20`
  test_category value (the SDK `build.rs` already declared it).

Resolves the `with_transport_kind` follow-up from PR Azure#4319.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a public `with_gateway20_disabled(bool)` method on
`CosmosClientBuilder` that propagates the flag to the underlying driver's
`ConnectionPoolOptionsBuilder`. This is the SDK-level entry point that
operators use to opt in to (or out of) the Gateway 2.0 transport.

Pre-GA, the toggle defaults to `true` (Gateway 2.0 suppressed) so the
behavioural change must be explicitly requested. The negative-term name
mirrors the driver-side flag and follows the negative-term policy from
GATEWAY_20_SPEC §3.

With the public toggle in place, fills the placeholder bodies in
`gateway20_e2e.rs` for point CRUD, query, transactional batch,
diagnostics validation, and the operator-override case. Each test
provisions a fresh database+container against the live Gateway 2.0
account, drives the operation, and asserts the standard
`CosmosDiagnostics` fields are populated. The change-feed test stays
empty until the SDK exposes a public change-feed API; `TransportKind`
assertions are documented as future work pending CosmosDiagnostics
exposure of the driver transport kind.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wraps Gateway 2.0 dispatch with the Slice 3d cutover from a single
EffectivePartitionKey RNTBD token to a payload that is *either* a point
EPK token *or* an outer HTTP range header pair, never both.

- Replace effective_partition_key_bytes with effective_partition_key_payload
  that calls EffectivePartitionKey::compute_range and branches on
  start == end (point) vs strict prefix (range).
- Point ops (full PK or single-hash) keep emitting the EPK RNTBD token
  (0x005A) — current proxy contract preserved.
- HPK partial-PK dispatches emit x-ms-thinclient-range-min/-max as outer
  HTTP headers (canonical un-padded hex), matching .NET's
  ProxyStartEpk/ProxyEndEpk and the spec's range header wire format.
- compute_range errors propagate as DataConversion (mapped to BadRequest
  upstream) rather than emit broken EPK metadata.

Adds three regression tests:
- wrap_emits_range_headers_for_hpk_prefix_partition_key
- wrap_emits_token_only_for_full_hpk_partition_key
- wrap_rejects_partition_key_with_too_many_components

Refs PR Azure#4319 follow-up: Slice 3d EPK cutover for queries/read-feeds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the spec test-coverage row 'HPK + Gateway 2.0: full vs partial PK'
on the SDK side: the unit-level header-emission proof was added with the
Slice 3d cutover; this commit guards the public SDK surface against
regressions where partial-PK queries would silently degrade.

The new test:

- Provisions a 3-component HPK container
  (/tenantId, /userId, /sessionId).
- Inserts items spread across two tenants × two users × two sessions.
- Reads one item back via its full 3-component PK, exercising the
  EPK-token point-op path.
- Queries with a 1-component prefix (tenantId only) — the dispatcher
  emits x-ms-thinclient-range-min/-max — and asserts:
  * at least one page is returned;
  * every returned item belongs to the targeted tenant (no cross-tenant
    bleed);
  * the set of returned IDs matches the expected per-tenant set.

PartitionKey only has tuple From-impls for 2 and 3 components; the
1-component prefix is constructed from a Vec<PartitionKeyValue> so the
dispatcher sees an HPK partial PK rather than a single-hash key.

Refs PR Azure#4319 follow-up: HPK partial-PK paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The wrap path was dropping the inbound x-ms-continuation header before
serializing the RNTBD frame, so paginated queries on Gateway 2.0 always
restarted from page one and never advanced.

- Add RntbdRequestToken::ContinuationToken (0x0006, String) to mirror
  Java's RntbdRequestHeader.ContinuationToken and the same
  not-in-thinClientProxyExcludedSet behavior in .NET.
- Plumb the inbound x-ms-continuation header into the RNTBD metadata
  stream as a string token; values are passed through verbatim
  (including empty strings) for symmetry with the unwrap side and the
  .NET/Java implementations.
- Document the request/response continuation-token format in
  GATEWAY_20_SPEC.md.
- Add 3 driver-layer unit tests covering present/absent/empty header
  scenarios, plus an emulator-only E2E test
  (gateway20_query_paginates_via_continuation_tokens) that forces
  multi-page pagination and asserts no row is returned twice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pulls in PR Azure#4223 (Gateway 2.0 design spec) plus other release-branch
work so the PR diff no longer shows GATEWAY_20_SPEC.md as a new file.

Conflict resolution:
- GATEWAY_20_SPEC.md: keep both new HEAD subsections (continuation-token
  format + capability-bit composition table); upstream side was empty.

Post-merge integration fixes (call sites broken by upstream signature changes):
- operation_pipeline.rs: add 'effective_consistency: Session' field to
  two TransportRequestContext literals in the batch-headers tests.
- driver_fault_injection.rs: add 'item1' item-id arg to three
  context.create_item(...) calls (matches new test_client.rs API).
- gateway20_e2e.rs / cosmos_fault_injection.rs: add item-id arg to
  seven container.create_item(...) calls (matches new SDK API).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Gateway 2.0 (the next-generation Cosmos DB dataplane transport) is now
on by default whenever the account advertises a thin-client endpoint
and HTTP/2 is allowed on the connection-pool options. Operators can
still opt out per-client via `ConnectionPoolOptionsBuilder::with_gateway20_disabled(true)`
(driver) or `CosmosClientBuilder::with_gateway20_disabled(true)` (SDK).

The doc comments on both opt-out methods (and on the underlying field
+ accessor) are rewritten to:

* Drop the negative-term explanation that framed Gateway 2.0 as a
  pre-GA opt-in. Gateway 2.0 is on by default and the docs now say so
  directly.
* Add a 'Latency caveat' section noting that Gateway 2.0 is not
  covered by the Cosmos DB regional latency SLA. Workloads with
  strict latency requirements should evaluate before relying on it.

Test impact: the only test asserting on the default is
`connection_pool::tests::connection_pool_options_builder_defaults`,
either set the flag explicitly or force HTTP/2 off (which short-circuits
to disabled regardless of the default).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The standalone `sdk/cosmos/ci-gateway20.yml` pipeline is removed. The
Gateway 2.0 ("thin client") live tests now run as a second
`LiveTestMatrixConfigs` entry on the main `sdk/cosmos/ci.yml`
pipeline. This mirrors how the Java Cosmos SDK plumbs its thin-client
live tests in `sdk/cosmos/tests.yml` (Azure/azure-sdk-for-java).

The new entry (`Cosmos_gateway20_live_test`) points at
`live-gateway20-matrix.json` and reuses the same `azure-sdk-tests-cosmos`
service connection. Two pipeline-level `EnvVars` are wired in so the
tests can connect to a pre-provisioned thin-client account that is not
created per-pipeline-run:

* `AZURE_COSMOS_GW20_ENDPOINT` ← `$(thinclient-test-endpoint)`
* `AZURE_COSMOS_GW20_KEY` ← `$(thinclient-test-key)`

(NOTE: those secret variable names follow Java's convention. They may
need to be renamed to whatever the Cosmos service connection actually
exposes; this can be verified on the next live-test run.)

The matrix machinery still requires `ArmTemplateParameters`, so the
deploy step continues to create a throwaway account; the Gateway 2.0
tests just ignore it and connect via the env vars instead. The
existing test-category gating (`gateway20` /
`gateway20_multi_region`) flows through the bicep template into
`COSMOS_RUSTFLAGS`, which gates the
`#[cfg(test_category="gateway20")]` test functions — independent of
the account the tests connect to.

Doc comments in `gateway20_e2e.rs` and the
`GATEWAY_20_SPEC.md` file-changes table are updated to reference the
consolidated pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the SDK's parallel fault-injection type system with re-exports
from azure_data_cosmos_driver::fault_injection. The duplicate types
existed only because the SDK and driver each had their own injection
module; with the driver fault-injection now wired into the gateway
transport via a shared rule slice, there is no reason to keep two
copies.

Removed:
- sdk/.../fault_injection/{condition,result,rule}.rs (~590 lines)
- driver_bridge::sdk_fi_rules_to_driver_fi_rules() and the entire
  feature-gated translation block (~127 lines)
- The dual-state Arc<AtomicBool>/Arc<AtomicU32> wiring on the SDK rule —
  no longer needed because both transports now share the same
  Arc<FaultInjectionRule>.
- Dead 'passthrough_statuses' tracking on the SDK FaultClient.

Kept SDK-owned:
- FaultInjectionClientBuilder — produces the gateway-side FaultClient
  HTTP transport.
- A small fault_operation_for_sdk(SdkOpType, SdkResType) adapter so
  CosmosRequest::add_fault_injection_headers can stamp the right
  operation tag using the SDK enum (the driver's
  FaultOperationType::from_operation_and_resource takes driver enums).

Other touches:
- Promoted driver FaultInjectionRule::increment_hit_count to pub so the
  SDK FaultClient (separate crate) can call it.
- CustomResponse construction in tests now uses CustomResponseBuilder
  because driver fields are private; accessor calls (status_code(),
  body(), region(), etc.) replace direct field reads.
- Forward-compat: the FaultInjectionErrorType match in apply_fault gets
  a wildcard arm because the driver enum is #[non_exhaustive].
- Updated sdk-to-driver-cutover.md to describe the post-cutover
  architecture and drop references to the deleted bridge function.

Net diff: -1005 / +144 lines.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds entries to both crate CHANGELOGs covering the changes from the
prior three commits in this PR:

- Driver 0.3.0 (Features Added): Gateway 2.0 transport is now the
  default; the new `with_gateway20_disabled()` opt-out toggle, with
  the latency / SLA caveat surfaced explicitly so operators know what
  they're getting into.
- SDK 0.34.0 (Features Added): the matching public
  `CosmosClientBuilder::with_gateway20_disabled()` builder.
- SDK 0.34.0 (Breaking Changes): the fault-injection re-export
  consolidation. The SDK previously had its own
  `FaultInjectionRule`/`FaultInjectionCondition`/etc with public
  fields; field access is now via accessor methods on the driver
  re-exports.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Build Analyze CI stage runs `cargo doc --no-deps --all-features`
and was failing with two issues introduced by the fault-injection
re-export refactor (commit d010978):

1. `error: redundant explicit link target` on the `TransportKind`
   re-export — the explicit `(azure_data_cosmos_driver::diagnostics::TransportKind)`
   target is redundant because rustdoc resolves `[`TransportKind`]`
   to the same destination via the in-scope `pub use`.

2. `warning: public documentation for `fault_injection` links to
   private item `crate::operation_context::OperationType`` (and the
   matching warning for `ResourceType`). Rustdoc's
   `private_intra_doc_links` lint flags these because the SDK's
   internal `OperationType` / `ResourceType` are crate-private; the
   module-level prose can refer to them by bare name without any
   intra-doc link target.

Both fixes are docstring-only — no behavioral change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per GATEWAY_20_SPEC.md §35, 'Gateway 2.0' is the canonical name in
all Rust code, docs, and comments. 'Thin client' is reserved only for
(a) Java/.NET source symbol references and (b) literal wire-header
strings such as x-ms-thinclient-proxy-*.

Renamed (Rust):

- AccountProperties accessor methods on the driver-internal cache:
    has_thin_client_endpoints     → has_gateway20_endpoints
    thin_client_writable_regions  → gateway20_writable_regions
    thin_client_readable_regions  → gateway20_readable_regions
- routing_systems::parse_thin_client_locations and its parameter /
    local-variable names → parse_gateway20_locations.
- E2E test fn gateway20_query_streams_through_thin_client →
    gateway20_query_streams.
- Test fixture URL host central.thinclient.azure.com →
    central.gateway20.azure.com in operation_pipeline.rs.

Renamed (docs/comments/CI):

- Doc comments and inline comments referencing 'thin client' /
    'thin-client' across CHANGELOGs, connection_pool, constants,
    adaptive_transport, account_metadata_cache, diagnostics_context,
    operation_pipeline, routing_systems, fault-injection tests,
    gateway20 pipeline tests, GATEWAY_20_SPEC, TRANSPORT_PIPELINE_SPEC,
    cosmos_client_builder (public-facing API doc), gateway20_e2e
    module doc, and ci.yml comments.
- ci.yml secret variable names $(thinclient-test-endpoint) /
    $(thinclient-test-key) → $(gateway20-test-endpoint) /
    $(gateway20-test-key), mirroring the AZURE_COSMOS_GW20_*
    env-var naming convention. NOTE: the corresponding KeyVault-backed
    entries in the azure-sdk-tests-cosmos service connection's
    variable group may need a matching rename by engsys.
- Log messages 'non-HTTPS thin-client endpoint URL' / 'Duplicate
    thin-client region with conflicting URL' rewritten to use
    'Gateway 2.0'.
- Dropped public-facing parentheticals like 'Gateway 2.0 ("thin
    client")' from CHANGELOG and CosmosClientBuilder docs per the
    user request to stop explaining the legacy term.

Kept (per policy exceptions):

- All x-ms-thinclient-* / x-ms-cosmos-use-thinclient wire header
    strings and the inline-doc comment block in constants.rs that
    explains why those wire names are unchanged (server-defined).
- All Java/.NET symbol references (ThinClientStoreModel,
    ThinClientStoreClient, ThinClientHttpMessageHandler,
    thinClientProxyExcludedSet, useThinClientStoreModel, …) in spec
    text.
- AccountProperties::thin_client_*_locations field names — they
    mirror the wire JSON properties via #[serde(rename_all =
    "camelCase")] and renaming would require explicit
    #[serde(rename = "…")] decorations. Added explanatory note to
    the field doc comments.
- GATEWAY20_USE_THINCLIENT constant identifier — wire-string-mirroring
    suffix.
- GATEWAY_20_SPEC.md historical 'formerly "thin client"' note,
    cspell directive, and the naming policy itself.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate DoNotReview

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant