SDK-to-Driver Cutover Guide
This guide describes how to route azure_data_cosmos SDK operations through the azure_data_cosmos_driver, replacing the legacy gateway pipeline. Use it as a step-by-step reference when cutting over any operation.
Reference PRs: #4005 (driver bootstrap), #4053 (read_item), #4055 (options alignment), #4111 (create_item).
Rules
- The driver is required, not optional. No gateway fallback within a single operation.
- The SDK's public API does not change. Same signatures, same return types, same observable behavior.
- Each operation is cut over one at a time. Operations not yet cut over continue using the gateway pipeline naturally.
- Do not modify generated code in
generated/ subdirectories.
Current State
| Method |
Pipeline |
PR |
read_item |
Driver |
#4053 |
create_item |
Driver |
#4111 |
replace_item |
Legacy (gateway) |
— |
upsert_item |
Legacy (gateway) |
— |
delete_item |
Legacy (gateway) |
— |
query_items |
Legacy (gateway) |
— |
| Database/Container CRUD |
Legacy (gateway) |
— |
| Throughput operations |
Legacy (gateway) |
— |
Data Flow
container_client.{operation}(args, options)
│
├── PartitionKey → into_driver_partition_key()
│
├── options.session_token ──→ operation.with_session_token()
├── options.precondition ──→ operation.with_precondition()
│
├── CosmosOperation::{operation}(container_ref, pk)
│ └── .with_body(bytes) // writes only
│
├── driver.execute_operation(operation, options.operation)
│ └── options.operation is OperationOptions, passed directly
│ (auth, routing, retries, content_response_on_write — all handled by driver)
│
└── driver_bridge::driver_response_to_cosmos_response(response)
└── wrapped in ItemResponse / ResourceResponse / etc.
How to Cut Over an Operation
Step 1: Build the operation
Use the appropriate CosmosOperation::* factory method:
// Point read
let operation = CosmosOperation::read_item(item_ref);
// Write (create, replace, upsert)
let body = serde_json::to_vec(&item)?;
let operation = CosmosOperation::create_item(self.container_ref.clone(), driver_pk)
.with_body(body);
// Delete
let operation = CosmosOperation::delete_item(item_ref);
Step 2: Wire session token and precondition
These go on CosmosOperation, not OperationOptions. The types are already driver types (re-exported by the SDK), so no conversion is needed:
if let Some(session_token) = options.session_token {
operation = operation.with_session_token(session_token);
}
if let Some(precondition) = options.precondition {
operation = operation.with_precondition(precondition);
}
Step 3: Execute through the driver
Pass options.operation (the embedded OperationOptions) directly:
let driver_response = self
.driver
.execute_operation(operation, options.operation)
.await?;
The driver handles everything: auth, routing, retries, Prefer: return=minimal for writes, custom headers, excluded regions.
Step 4: Bridge the response
Ok(ItemResponse::new(
crate::driver_bridge::driver_response_to_cosmos_response(driver_response),
))
Use the appropriate public wrapper:
| Public Type |
Used For |
ItemResponse<T> |
create/read/replace/upsert/delete item |
ResourceResponse<T> |
create/read/delete database/container |
BatchResponse |
transactional batch |
QueryFeedPage<T> |
query operations |
Step 5: Clean up the old code
Remove the legacy gateway code from the method body: CosmosRequest::builder(...), options.apply_headers(...), self.container_connection.send(...), and any excluded_regions extraction.
Step 6: Update tests
If the operation has fault injection tests that assert request_url():
request_url() returns None for driver-routed operations. Use if let Some(url) = response.request_url() { ... } until SDK diagnostics exposes the driver's effective endpoint.
If tests use hit_limit on fault injection rules:
- The driver retries internally (up to 4 attempts for reads). A single SDK call may consume multiple hits. Multiply
hit_limit accordingly.
If the operation was used in a test to trigger SDK-pipeline-level PKRange resolution:
- Driver-routed operations handle routing internally and do not trigger PKRange resolution through the SDK pipeline. Switch such tests to use a gateway-routed operation.
Key Files
| File |
Role |
src/clients/container_client.rs |
Operation implementations. driver and container_ref fields are available. |
src/driver_bridge.rs |
driver_response_to_cosmos_response() and header conversion. Shared by all cutovers. |
src/options/mod.rs |
ItemReadOptions, ItemWriteOptions, BatchOptions — all embed OperationOptions directly. apply_headers() shims exist for gateway-routed operations (remove when cutting over). |
src/partition_key.rs |
into_driver_partition_key() — converts SDK PartitionKey to driver PartitionKey. |
Options Architecture
The SDK re-exports driver types directly:
// sdk/cosmos/azure_data_cosmos/src/options/mod.rs
pub use azure_data_cosmos_driver::models::{ETag, Precondition, SessionToken};
pub use azure_data_cosmos_driver::options::{
ContentResponseOnWrite, OperationOptions, ExcludedRegions, Region, ...
};
ItemReadOptions and ItemWriteOptions both embed OperationOptions:
pub struct ItemReadOptions { // ItemWriteOptions is identical
pub operation: OperationOptions, // passed directly to driver
pub session_token: Option<SessionToken>,
pub precondition: Option<Precondition>,
}
No bridge translation function is needed. The driver resolves OperationOptions through a four-layer hierarchy (env → runtime → account → per-operation).
Response Bridge
The driver returns untyped CosmosResponse { body: Vec<u8>, headers, status }. The bridge in driver_bridge.rs reconstructs the SDK's typed CosmosResponse<T>:
- Converts driver
CosmosResponseHeaders → raw azure_core::Headers (10 headers: activity ID, request charge, session token, etag, continuation, item count, substatus, server duration, index metrics, query metrics).
- Builds
RawResponse::from_bytes(status, headers, body).
- Wraps in
CosmosResponse::from_response(response) (sets request: None).
Limitation: Only headers the driver explicitly parses are preserved. Other server headers are lost.
Fault Injection
Fault injection wiring between SDK and driver is already complete — no additional work needed per cutover. Key facts:
CosmosClientBuilder::build() translates SDK fault injection rules to driver rules via sdk_fi_rules_to_driver_fi_rules() in driver_bridge.rs.
enabled and hit_count state is shared via Arc — toggling a rule affects both gateway and driver paths.
CustomResponse translation is not yet implemented. Extend the bridge if a test needs it.
- After full cutover, the SDK's
fault_injection module will be replaced by re-exporting the driver's types.
Configuration Gaps
The following SDK-level configuration is not yet forwarded to the driver:
| Gap |
Detail |
user_agent_suffix |
CosmosDriverRuntimeBuilder::with_user_agent_suffix() exists but SDK doesn't call it |
application_region |
SDK uses it for routing but doesn't pass to driver as preferred region |
Client-level custom_headers |
Only per-operation custom headers are forwarded |
Client-level base OperationOptions |
Runtime/account-level defaults not set; only env vars or per-call |
| Connection pool / transport |
CosmosDriverRuntimeBuilder accepts these but SDK doesn't forward them |
| Pre-configured driver injection |
No CosmosClientBuilder::with_driver() method (#3908) |
These gaps don't affect correctness (driver defaults and env vars work) but matter for production configuration fidelity.
Diagnostics Gap
The driver captures rich per-request diagnostics (RequestDiagnostics: region, endpoint, execution context, retries). The SDK's CosmosDiagnostics currently only surfaces activity_id() and server_duration_ms(). Until this gap is closed:
ItemResponse::request_url() returns Option<Url> (None for driver-routed operations).
- Fault injection tests that assert failover endpoints are silently skipped for driver-routed operations.
Tests with skipped endpoint assertions:
| Test File |
Test Name |
cosmos_items.rs |
assert_response helper (all item tests) |
cosmos_fault_injection.rs |
fault_injection_429_retry_with_hit_limit |
cosmos_multi_write_retry_policies.rs |
read_cross_region_retry_on_408, read_cross_region_retry_on_500 |
cosmos_multi_write_fault_injection.rs |
fault_injection_read_unaffected_by_create_rule, fault_injection_read_region_retry_503, fault_injection_read_session_retry_404_1002, fault_injection_read_connection_error_failover, fault_injection_read_response_timeout_retries_to_satellite, fault_injection_connection_error_local_retry_succeeds |
Infrastructure Prerequisites
The following infrastructure must be built in the driver before certain operations can be fully cut over. These are not per-operation tasks — they are shared foundations.
Wire PartitionKeyRangeCache into CosmosDriver
Status: Not started. The cache type exists (driver/cache/partition_key_range_cache.rs) with full resolve/lookup logic and tests, but CosmosDriver does not hold an instance of it.
What's needed:
-
Add a PartitionKeyRangeCache field to CosmosDriver, initialize it in new().
-
Add a CosmosOperation::read_all_pk_ranges(container) factory method — the cache needs a fetch function that calls the /pkranges endpoint via execute_operation().
-
Expose a public method on CosmosDriver:
pub async fn resolve_routing_map(
&self,
container: &ContainerReference,
force_refresh: bool,
) -> azure_core::Result<Arc<ContainerRoutingMap>>
-
Make ContainerRoutingMap public (currently pub(crate) in driver/cache/container_routing_map.rs).
Who needs it:
query_items — needs routing map for cross-partition fan-out
feed_range_from_partition_key / read_feed_ranges — currently uses the SDK-side routing map as a workaround (see below)
- Change feed — will need routing map for partition-scoped consumption
Current workaround: The feed range API (feed_range_from_partition_key, read_feed_ranges) uses the SDK's ContainerConnection::resolve_routing_map() (gateway-side PartitionKeyRangeCache) for partition lookup, while using the driver's EffectivePartitionKey::compute() / compute_range() for EPK hashing. This produces correct results but maintains two separate routing map caches (SDK-side and driver-side) for the same data. Once the driver exposes resolve_routing_map(), the feed range methods should switch to it.
Remaining Operations to Cut Over
Item operations (on ContainerClient)
| Operation |
Options type |
Notes |
replace_item |
ItemWriteOptions |
Same pattern as create_item. Uses CosmosOperation::replace_item(item_ref).with_body(bytes). |
upsert_item |
ItemWriteOptions |
Same pattern as create_item. Uses CosmosOperation::upsert_item(container_ref, pk).with_body(bytes). Driver prerequisite: The driver's CosmosOperation::upsert_item() currently takes ItemReference but must be changed to take (ContainerReference, PartitionKey) with into_feed_reference(), matching create_item. The Cosmos DB REST API treats upsert as a POST to the collection feed (same as create), not a PUT to a specific document URL. |
delete_item |
ItemWriteOptions |
No body. Uses CosmosOperation::delete_item(item_ref). Prefer: return=minimal is sent but ignored by the service. |
query_items |
Separate query options |
Uses QueryExecutor + gateway pipeline with pagination (QueryFeedPage). Fundamentally different flow — needs special bridge logic. |
Database and container CRUD
| Client |
Operations |
CosmosClient |
create_database, query_databases |
DatabaseClient |
read, create_container, query_containers, delete, read_throughput, replace_throughput |
ContainerClient |
read, replace, delete, read_throughput, replace_throughput |
These use ResourceResponse (not ItemResponse) and may require different option types, but the core pattern (build operation → pass options → execute → bridge response) is the same. Throughput operations currently go through OffersClient.
Feed range operations (on ContainerClient)
| Operation |
Current state |
Blocked on |
read_feed_ranges |
Uses SDK-side routing map |
Driver resolve_routing_map() |
feed_range_from_partition_key |
Uses driver EPK + SDK-side routing map |
Driver resolve_routing_map() |
These are functional today — feed_range_from_partition_key correctly computes MultiHash EPKs via the driver's EffectivePartitionKey::compute() / compute_range() and supports both full and prefix partition keys. The only remaining work is switching the routing map lookup from the SDK-side cache to the driver's cache once the infrastructure prerequisite above is complete.
Post-Cutover Cleanup
After driver resolve_routing_map() is available
- Switch
read_feed_ranges and feed_range_from_partition_key from ContainerConnection::resolve_routing_map() to self.driver.resolve_routing_map().
- Remove
ContainerConnection::resolve_routing_map(), partition_key_definition(), and collection_rid() accessors.
- Remove
from_sdk_partition_key_range() from FeedRange (use from_partition_key_range() with driver types).
After all item operations are driver-routed
- Remove hit-count multiplier workarounds in fault injection tests.
- Remove
ContainerConnection (unless still needed for container-level CRUD).
- Remove
apply_headers() shims from ItemWriteOptions and BatchOptions in options/mod.rs.
After all operations are driver-routed (full cutover)
Types and modules to remove:
CosmosRequest and CosmosResponse::new(response, request) constructor
Option<CosmosRequest> field on CosmosResponse<T>
GatewayPipeline and src/pipeline/ (authorization_policy, etc.)
ContainerConnection and src/handler/
QueryExecutor and src/query/executor.rs
OffersClient
src/routing/ (global_endpoint_manager, location_cache, partition_key_range_cache, etc.)
src/retry_policies/ (client_retry_policy, metadata_request_retry_policy, resource_throttle_retry_policy)
src/cosmos_request.rs, src/request_context.rs, src/resource_context.rs, src/routing_strategy.rs
src/fault_injection/ — entire directory; re-export driver types instead
sdk_fi_rules_to_driver_fi_rules() and shared-state accessors in driver_bridge.rs
- All
apply_headers() shims in options/mod.rs
request_url() → Option<Url> on ItemResponse (replace with diagnostics)
- Duplicate header constants in
src/constants.rs that overlap with driver's cosmos_headers.rs
~15+ #[allow(dead_code)] annotations including legacy SubStatusCode constants block
Remaining in driver_bridge.rs: Only driver_response_to_cosmos_response() and driver_response_headers_to_headers().
Pending independently
- PartitionKey unification: Eliminate
into_driver_partition_key() and the dual PartitionKey types.
- SDK diagnostics: Surface the driver's
RequestDiagnostics through CosmosDiagnostics, then restore skipped endpoint assertions and remove request_url().
custom_headers review: Determine whether custom_headers on OperationOptions is still needed at the driver level.
- Driver injection: Add
CosmosClientBuilder::with_driver(Arc<CosmosDriver>) (#3908).
Recommended Work Breakdown
The total cutover work is organized into PRs that can be assigned and reviewed independently. Dependencies between items are noted — items without dependencies can be worked in parallel.
Phase 1: Remaining item point operations
These follow the established read_item / create_item pattern exactly. No infrastructure work needed — all can be done in parallel.
| PR |
Scope |
Complexity |
1a. replace_item cutover |
Rewrite replace_item to use CosmosOperation::replace_item(item_ref).with_body(bytes). Same pattern as create_item. |
Small |
1b. upsert_item cutover |
Rewrite upsert_item. Note: driver's CosmosOperation::upsert_item() currently takes ItemReference but the REST API treats upsert as a POST to the collection feed (like create). May need to change the factory to take (ContainerReference, PartitionKey). |
Small |
1c. delete_item cutover |
Rewrite delete_item to use CosmosOperation::delete_item(item_ref). No body. |
Small |
After all three merge: remove apply_headers() shims, update fault injection hit-count multipliers, consider removing ContainerConnection for item routing.
Phase 2: Driver routing map infrastructure
This is the foundational work that unblocks query operations, feed range cleanup, and change feed.
| PR |
Scope |
Complexity |
2a. Wire PartitionKeyRangeCache into CosmosDriver |
Add cache field, CosmosOperation::read_all_pk_ranges() factory, CosmosDriver::resolve_routing_map() public method, make ContainerRoutingMap public. See Infrastructure Prerequisites for details. |
Medium |
| 2b. Switch feed range methods to driver routing map |
Replace ContainerConnection::resolve_routing_map() with self.driver.resolve_routing_map() in read_feed_ranges and feed_range_from_partition_key. Remove SDK-side accessors and from_sdk_partition_key_range. Depends on 2a. |
Small |
Phase 3: Query operations
Query is fundamentally different from point operations — it involves pagination, cross-partition fan-out, and query plan retrieval. This is the largest single cutover effort.
| PR |
Scope |
Complexity |
3a. query_items cutover |
Replace QueryExecutor + gateway pipeline with driver-based query execution. Needs the routing map (from Phase 2) for cross-partition fan-out. Must handle QueryFeedPage pagination, continuation tokens, and query/index metrics bridging. Depends on 2a. |
Large |
3b. query_databases / query_containers cutover |
These are simpler queries on CosmosClient / DatabaseClient that can follow whatever pattern query_items establishes. Depends on 3a (for the pattern, not infrastructure). |
Medium |
Phase 4: Database and container CRUD
These use ResourceResponse instead of ItemResponse but follow the same build-operation → execute → bridge-response pattern. Can be done in parallel with each other.
| PR |
Scope |
Complexity |
4a. ContainerClient CRUD cutover |
read, replace, delete on container. |
Small |
4b. DatabaseClient CRUD cutover |
read, create_container, delete on database. |
Small |
4c. CosmosClient CRUD cutover |
create_database. |
Small |
Phase 5: Throughput operations
Currently routed through OffersClient which uses QueryExecutor for reads and CosmosRequest for writes.
| PR |
Scope |
Complexity |
| 5a. Throughput operations cutover |
read_throughput and replace_throughput on DatabaseClient and ContainerClient. Depends on 3a (throughput reads use query internally). |
Medium |
Phase 6: Cleanup
Once all operations are driver-routed, remove the legacy gateway infrastructure.
| PR |
Scope |
Complexity |
| 6a. Remove gateway pipeline |
Delete GatewayPipeline, ContainerConnection, QueryExecutor, OffersClient, CosmosRequest, SDK-side routing/retry, and all apply_headers shims. See Post-Cutover Cleanup for the full list. |
Large |
| 6b. Consolidate fault injection |
Drop azure_data_cosmos::fault_injection module, re-export driver types directly, remove sdk_fi_rules_to_driver_fi_rules(), update tests. |
Medium |
Independent work (no ordering constraints)
These can be done at any point and don't block or depend on the phase work above:
- PartitionKey unification — eliminate
into_driver_partition_key() and the dual types.
- SDK diagnostics — surface driver's
RequestDiagnostics through CosmosDiagnostics.
- Driver injection — add
CosmosClientBuilder::with_driver() (#3908).
custom_headers review — determine if still needed at the driver level.
Dependency graph
Phase 1 (item ops) Phase 2 (routing map) Independent
1a. replace_item ──┐ 2a. PKRange cache ──┐
1b. upsert_item ──┤ ├── 2b. Feed range cleanup
1c. delete_item ──┘ │
├── 3a. query_items
Phase 4 (CRUD) │ │
4a. Container ──┐ │ └── 3b. query_dbs/containers
4b. Database ──┤ │ │
4c. Cosmos ──┘ │ │
│ 5a. Throughput ┘
│
Phase 6 (cleanup) │
6a. Remove gateway ┘ (after ALL above)
6b. FI consolidation
SDK-to-Driver Cutover Guide
This guide describes how to route
azure_data_cosmosSDK operations through theazure_data_cosmos_driver, replacing the legacy gateway pipeline. Use it as a step-by-step reference when cutting over any operation.Reference PRs: #4005 (driver bootstrap), #4053 (
read_item), #4055 (options alignment), #4111 (create_item).Rules
generated/subdirectories.Current State
read_itemcreate_itemreplace_itemupsert_itemdelete_itemquery_itemsData Flow
How to Cut Over an Operation
Step 1: Build the operation
Use the appropriate
CosmosOperation::*factory method:Step 2: Wire session token and precondition
These go on
CosmosOperation, notOperationOptions. The types are already driver types (re-exported by the SDK), so no conversion is needed:Step 3: Execute through the driver
Pass
options.operation(the embeddedOperationOptions) directly:The driver handles everything: auth, routing, retries,
Prefer: return=minimalfor writes, custom headers, excluded regions.Step 4: Bridge the response
Use the appropriate public wrapper:
ItemResponse<T>ResourceResponse<T>BatchResponseQueryFeedPage<T>Step 5: Clean up the old code
Remove the legacy gateway code from the method body:
CosmosRequest::builder(...),options.apply_headers(...),self.container_connection.send(...), and anyexcluded_regionsextraction.Step 6: Update tests
If the operation has fault injection tests that assert
request_url():request_url()returnsNonefor driver-routed operations. Useif let Some(url) = response.request_url() { ... }until SDK diagnostics exposes the driver's effective endpoint.If tests use
hit_limiton fault injection rules:hit_limitaccordingly.If the operation was used in a test to trigger SDK-pipeline-level PKRange resolution:
Key Files
src/clients/container_client.rsdriverandcontainer_reffields are available.src/driver_bridge.rsdriver_response_to_cosmos_response()and header conversion. Shared by all cutovers.src/options/mod.rsItemReadOptions,ItemWriteOptions,BatchOptions— all embedOperationOptionsdirectly.apply_headers()shims exist for gateway-routed operations (remove when cutting over).src/partition_key.rsinto_driver_partition_key()— converts SDKPartitionKeyto driverPartitionKey.Options Architecture
The SDK re-exports driver types directly:
ItemReadOptionsandItemWriteOptionsboth embedOperationOptions:No bridge translation function is needed. The driver resolves
OperationOptionsthrough a four-layer hierarchy (env → runtime → account → per-operation).Response Bridge
The driver returns untyped
CosmosResponse { body: Vec<u8>, headers, status }. The bridge indriver_bridge.rsreconstructs the SDK's typedCosmosResponse<T>:CosmosResponseHeaders→ rawazure_core::Headers(10 headers: activity ID, request charge, session token, etag, continuation, item count, substatus, server duration, index metrics, query metrics).RawResponse::from_bytes(status, headers, body).CosmosResponse::from_response(response)(setsrequest: None).Limitation: Only headers the driver explicitly parses are preserved. Other server headers are lost.
Fault Injection
Fault injection wiring between SDK and driver is already complete — no additional work needed per cutover. Key facts:
CosmosClientBuilder::build()translates SDK fault injection rules to driver rules viasdk_fi_rules_to_driver_fi_rules()indriver_bridge.rs.enabledandhit_countstate is shared viaArc— toggling a rule affects both gateway and driver paths.CustomResponsetranslation is not yet implemented. Extend the bridge if a test needs it.fault_injectionmodule will be replaced by re-exporting the driver's types.Configuration Gaps
The following SDK-level configuration is not yet forwarded to the driver:
user_agent_suffixCosmosDriverRuntimeBuilder::with_user_agent_suffix()exists but SDK doesn't call itapplication_regioncustom_headersOperationOptionsCosmosDriverRuntimeBuilderaccepts these but SDK doesn't forward themCosmosClientBuilder::with_driver()method (#3908)These gaps don't affect correctness (driver defaults and env vars work) but matter for production configuration fidelity.
Diagnostics Gap
The driver captures rich per-request diagnostics (
RequestDiagnostics: region, endpoint, execution context, retries). The SDK'sCosmosDiagnosticscurrently only surfacesactivity_id()andserver_duration_ms(). Until this gap is closed:ItemResponse::request_url()returnsOption<Url>(Nonefor driver-routed operations).Tests with skipped endpoint assertions:
cosmos_items.rsassert_responsehelper (all item tests)cosmos_fault_injection.rsfault_injection_429_retry_with_hit_limitcosmos_multi_write_retry_policies.rsread_cross_region_retry_on_408,read_cross_region_retry_on_500cosmos_multi_write_fault_injection.rsfault_injection_read_unaffected_by_create_rule,fault_injection_read_region_retry_503,fault_injection_read_session_retry_404_1002,fault_injection_read_connection_error_failover,fault_injection_read_response_timeout_retries_to_satellite,fault_injection_connection_error_local_retry_succeedsInfrastructure Prerequisites
The following infrastructure must be built in the driver before certain operations can be fully cut over. These are not per-operation tasks — they are shared foundations.
Wire
PartitionKeyRangeCacheintoCosmosDriverStatus: Not started. The cache type exists (
driver/cache/partition_key_range_cache.rs) with full resolve/lookup logic and tests, butCosmosDriverdoes not hold an instance of it.What's needed:
Add a
PartitionKeyRangeCachefield toCosmosDriver, initialize it innew().Add a
CosmosOperation::read_all_pk_ranges(container)factory method — the cache needs a fetch function that calls the/pkrangesendpoint viaexecute_operation().Expose a public method on
CosmosDriver:Make
ContainerRoutingMappublic (currentlypub(crate)indriver/cache/container_routing_map.rs).Who needs it:
query_items— needs routing map for cross-partition fan-outfeed_range_from_partition_key/read_feed_ranges— currently uses the SDK-side routing map as a workaround (see below)Current workaround: The feed range API (
feed_range_from_partition_key,read_feed_ranges) uses the SDK'sContainerConnection::resolve_routing_map()(gateway-sidePartitionKeyRangeCache) for partition lookup, while using the driver'sEffectivePartitionKey::compute()/compute_range()for EPK hashing. This produces correct results but maintains two separate routing map caches (SDK-side and driver-side) for the same data. Once the driver exposesresolve_routing_map(), the feed range methods should switch to it.Remaining Operations to Cut Over
Item operations (on
ContainerClient)replace_itemItemWriteOptionscreate_item. UsesCosmosOperation::replace_item(item_ref).with_body(bytes).upsert_itemItemWriteOptionscreate_item. UsesCosmosOperation::upsert_item(container_ref, pk).with_body(bytes). Driver prerequisite: The driver'sCosmosOperation::upsert_item()currently takesItemReferencebut must be changed to take(ContainerReference, PartitionKey)withinto_feed_reference(), matchingcreate_item. The Cosmos DB REST API treats upsert as aPOSTto the collection feed (same as create), not aPUTto a specific document URL.delete_itemItemWriteOptionsCosmosOperation::delete_item(item_ref).Prefer: return=minimalis sent but ignored by the service.query_itemsQueryExecutor+ gateway pipeline with pagination (QueryFeedPage). Fundamentally different flow — needs special bridge logic.Database and container CRUD
CosmosClientcreate_database,query_databasesDatabaseClientread,create_container,query_containers,delete,read_throughput,replace_throughputContainerClientread,replace,delete,read_throughput,replace_throughputThese use
ResourceResponse(notItemResponse) and may require different option types, but the core pattern (build operation → pass options → execute → bridge response) is the same. Throughput operations currently go throughOffersClient.Feed range operations (on
ContainerClient)read_feed_rangesresolve_routing_map()feed_range_from_partition_keyresolve_routing_map()These are functional today —
feed_range_from_partition_keycorrectly computes MultiHash EPKs via the driver'sEffectivePartitionKey::compute()/compute_range()and supports both full and prefix partition keys. The only remaining work is switching the routing map lookup from the SDK-side cache to the driver's cache once the infrastructure prerequisite above is complete.Post-Cutover Cleanup
After driver
resolve_routing_map()is availableread_feed_rangesandfeed_range_from_partition_keyfromContainerConnection::resolve_routing_map()toself.driver.resolve_routing_map().ContainerConnection::resolve_routing_map(),partition_key_definition(), andcollection_rid()accessors.from_sdk_partition_key_range()fromFeedRange(usefrom_partition_key_range()with driver types).After all item operations are driver-routed
ContainerConnection(unless still needed for container-level CRUD).apply_headers()shims fromItemWriteOptionsandBatchOptionsinoptions/mod.rs.After all operations are driver-routed (full cutover)
Types and modules to remove:
CosmosRequestandCosmosResponse::new(response, request)constructorOption<CosmosRequest>field onCosmosResponse<T>GatewayPipelineandsrc/pipeline/(authorization_policy, etc.)ContainerConnectionandsrc/handler/QueryExecutorandsrc/query/executor.rsOffersClientsrc/routing/(global_endpoint_manager, location_cache, partition_key_range_cache, etc.)src/retry_policies/(client_retry_policy, metadata_request_retry_policy, resource_throttle_retry_policy)src/cosmos_request.rs,src/request_context.rs,src/resource_context.rs,src/routing_strategy.rssrc/fault_injection/— entire directory; re-export driver types insteadsdk_fi_rules_to_driver_fi_rules()and shared-state accessors indriver_bridge.rsapply_headers()shims inoptions/mod.rsrequest_url() → Option<Url>onItemResponse(replace with diagnostics)src/constants.rsthat overlap with driver'scosmos_headers.rs~15+ #[allow(dead_code)]annotations including legacySubStatusCodeconstants blockRemaining in
driver_bridge.rs: Onlydriver_response_to_cosmos_response()anddriver_response_headers_to_headers().Pending independently
into_driver_partition_key()and the dualPartitionKeytypes.RequestDiagnosticsthroughCosmosDiagnostics, then restore skipped endpoint assertions and removerequest_url().custom_headersreview: Determine whethercustom_headersonOperationOptionsis still needed at the driver level.CosmosClientBuilder::with_driver(Arc<CosmosDriver>)(#3908).Recommended Work Breakdown
The total cutover work is organized into PRs that can be assigned and reviewed independently. Dependencies between items are noted — items without dependencies can be worked in parallel.
Phase 1: Remaining item point operations
These follow the established
read_item/create_itempattern exactly. No infrastructure work needed — all can be done in parallel.replace_itemcutoverreplace_itemto useCosmosOperation::replace_item(item_ref).with_body(bytes). Same pattern ascreate_item.upsert_itemcutoverupsert_item. Note: driver'sCosmosOperation::upsert_item()currently takesItemReferencebut the REST API treats upsert as aPOSTto the collection feed (like create). May need to change the factory to take(ContainerReference, PartitionKey).delete_itemcutoverdelete_itemto useCosmosOperation::delete_item(item_ref). No body.After all three merge: remove
apply_headers()shims, update fault injection hit-count multipliers, consider removingContainerConnectionfor item routing.Phase 2: Driver routing map infrastructure
This is the foundational work that unblocks query operations, feed range cleanup, and change feed.
PartitionKeyRangeCacheintoCosmosDriverCosmosOperation::read_all_pk_ranges()factory,CosmosDriver::resolve_routing_map()public method, makeContainerRoutingMappublic. See Infrastructure Prerequisites for details.ContainerConnection::resolve_routing_map()withself.driver.resolve_routing_map()inread_feed_rangesandfeed_range_from_partition_key. Remove SDK-side accessors andfrom_sdk_partition_key_range. Depends on 2a.Phase 3: Query operations
Query is fundamentally different from point operations — it involves pagination, cross-partition fan-out, and query plan retrieval. This is the largest single cutover effort.
query_itemscutoverQueryExecutor+ gateway pipeline with driver-based query execution. Needs the routing map (from Phase 2) for cross-partition fan-out. Must handleQueryFeedPagepagination, continuation tokens, and query/index metrics bridging. Depends on 2a.query_databases/query_containerscutoverCosmosClient/DatabaseClientthat can follow whatever patternquery_itemsestablishes. Depends on 3a (for the pattern, not infrastructure).Phase 4: Database and container CRUD
These use
ResourceResponseinstead ofItemResponsebut follow the same build-operation → execute → bridge-response pattern. Can be done in parallel with each other.ContainerClientCRUD cutoverread,replace,deleteon container.DatabaseClientCRUD cutoverread,create_container,deleteon database.CosmosClientCRUD cutovercreate_database.Phase 5: Throughput operations
Currently routed through
OffersClientwhich usesQueryExecutorfor reads andCosmosRequestfor writes.read_throughputandreplace_throughputonDatabaseClientandContainerClient. Depends on 3a (throughput reads use query internally).Phase 6: Cleanup
Once all operations are driver-routed, remove the legacy gateway infrastructure.
GatewayPipeline,ContainerConnection,QueryExecutor,OffersClient,CosmosRequest, SDK-side routing/retry, and allapply_headersshims. See Post-Cutover Cleanup for the full list.azure_data_cosmos::fault_injectionmodule, re-export driver types directly, removesdk_fi_rules_to_driver_fi_rules(), update tests.Independent work (no ordering constraints)
These can be done at any point and don't block or depend on the phase work above:
into_driver_partition_key()and the dual types.RequestDiagnosticsthroughCosmosDiagnostics.CosmosClientBuilder::with_driver()(#3908).custom_headersreview — determine if still needed at the driver level.Dependency graph