[Cosmos] Port read-item to use Driver#4053
Conversation
read-item to use Driverread-item to use Driver
read-item to use Driverread-item to use Driver
There was a problem hiding this comment.
Pull request overview
Ports the first SDK data-plane operation (ContainerClient::read_item) to execute through azure_data_cosmos_driver, establishing a reusable “SDK ↔ driver” translation layer intended to be replicated across the remaining operations.
Changes:
- Added an SDK-side
driver_bridgemodule to translateItemOptions → OperationOptionsand driver responses/headers → SDKCosmosResponse<T>. - Reworked
ContainerClient::read_itemto build a driverItemReference+CosmosOperationand execute viaCosmosDriver. - Extended the driver with
OperationOptions::custom_headersand exportedPartitionKeyValueto enable SDK→driver PK conversion.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos_driver/src/options/operation_options.rs | Adds custom_headers to driver OperationOptions. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/partition_key.rs | Makes PartitionKeyValue public for cross-crate construction via From impls. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/mod.rs | Re-exports PartitionKeyValue publicly from models. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs | Wires OperationOptions into transport request construction and applies custom_headers. |
| sdk/cosmos/azure_data_cosmos/src/pipeline/mod.rs | Updates gateway path to construct CosmosResponse with Some(request). |
| sdk/cosmos/azure_data_cosmos/src/partition_key.rs | Adds SDK→driver partition key conversion helper. |
| sdk/cosmos/azure_data_cosmos/src/options/mod.rs | Adds ItemOptions accessors used by the driver bridge. |
| sdk/cosmos/azure_data_cosmos/src/models/cosmos_response.rs | Makes stored request optional and updates fault-injection accessors accordingly. |
| sdk/cosmos/azure_data_cosmos/src/lib.rs | Registers the new driver_bridge module. |
| sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs | New module implementing options + response/header translation. |
| sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs | Stores driver/container reference on the client and routes read_item through the driver. |
| sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md | Adds the design specification documenting the cutover approach. |
|
Command 'rust' is not supported by Azure Pipelines. Supported commands
See additional documentation. |
|
Phew, I actually didn't want to run the pipeline right now. @simorenoh when this is passing PR checks, can you kick off the live tests? We should make sure those pass before merging. |
analogrelay
left a comment
There was a problem hiding this comment.
Just minor stuff that Copilot already flagged and commentary on the future (as I tend to do ;))
…ion_pipeline.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-for-rust into read-item-update
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
|
/azp run rust - cosmos - weekly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run rust - cosmos - weekly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run rust - cosmos - weekly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
71668fe
into
release/azure_data_cosmos-previews
This PR will contain the work needed in order to make the changes to port over an initial method (
read_item) to use the underlying driver as the connection. For now, I am sharing a spec of the proposed changes, in the hopes that this same spec will work to migrate all other remaining methods after we verify this one works.The spec file can be found in the PR to facilitate review, but will also be the description below. Actual code implementation to follow.
SDK-to-Driver Cutover: Design Specification
Overview
This document describes the design for routing
azure_data_cosmosSDK operations through theazure_data_cosmos_driverexecution engine, replacing the legacy gateway pipeline path. The first operation cut over isContainerClient::read_item, which serves as the reference pattern for all subsequent operations.Context
Prior to this work, the Cosmos SDK had two separate execution paths:
azure_data_cosmos): The SDK handled auth, routing, retries, and request construction viaCosmosRequest→GatewayPipeline→ HTTP.azure_data_cosmos_driver): A newer execution engine with its own transport, routing, and operation model (CosmosOperation+OperationOptions). Previously used only in driver-level tests.PR #4005 bridged the two worlds by having
ContainerClient::new()calldriver.resolve_container()for eager metadata resolution. This PR takes the next step: routing the first data operation through the driver.Goal
Make the SDK client a thin wrapper over the driver. The SDK translates public-facing types into driver concepts, delegates execution, and translates the response back. All real work (auth, routing, retries, transport) happens inside
driver.execute_operation().Architecture
Data Flow
Key Principle
The SDK's public API does not change.
read_itemretains the same signature, return type, and observable behavior. This is a pure internal refactor.Design Decision: Driver as Required Infrastructure
An alternative approach was explored where the driver is optional — stored as
Option<Arc<CosmosDriver>>onCosmosClient,DatabaseClient, andContainerClient. In that model, each operation checks at runtime whether a driver is available: if so, it takes the driver path; otherwise, it falls back to the legacy gateway pipeline. Container metadata resolution is also optional and failure is silently ignored.We chose not to take that approach, since we want to verify the behavior of the driver being used only and this single method will serve as the test. In this design, the driver is required:
CosmosClientstoresArc<CosmosDriver>(notOption).ContainerClient::new()eagerly resolves container metadata via the driver and returnsResult— if resolution fails, the client cannot be created.Rationale
The purpose of this cutover is to validate that the driver can fully replace the gateway pipeline for each operation. A fallback path undermines that goal:
OperationOptionson the driver path), which silently drops user-configured session tokens, etags, and excluded regions.Noneforrequest_charge(),session_token(), andetag().The cutover is intentionally incremental — one operation at a time. Operations that haven't been cut over yet continue using the gateway pipeline naturally (they don't call the driver). This gives us the gradual rollout benefit without the complexity of runtime branching within a single operation.
Type Translation Decisions
PartitionKey (SDK → Driver)
The SDK and driver define separate
PartitionKeytypes with identical structure but in different crates. Both represent a JSON array of typed values (string, number, bool, null).Approach: Added
into_driver_partition_key()on the SDK'sPartitionKeythat maps eachInnerPartitionKeyValuevariant to the driver'sPartitionKeyValue.Driver change required: Made
PartitionKeyValuepub(waspub(crate)) so the SDK crate can constructVec<PartitionKeyValue>for the conversion.Future consideration: Once Ashley's options alignment work unifies these types, this conversion can be eliminated, and we can just use the Driver's definitions the way we did with the ContainerReference.
ItemOptions → OperationOptions
The SDK's
ItemOptions(item-scoped request options) maps to the driver'sOperationOptionsfield-by-field. The types in each field differ between crates, so values are bridged via their string representations.ItemOptionsfieldOperationOptionssession_token: Option<SessionToken>.with_session_token()DriverSessionToken::new(token.to_string())if_match_etag: Option<Etag>.with_etag_condition()Precondition::if_match(ETag::new(etag.to_string()))custom_headers: HashMap<...>.with_custom_headers()excluded_regions: Option<Vec<RegionName>>.with_excluded_regions()Region::new(name.to_string())for eachcontent_response_on_write_enabled: boolDriver change required: Added
custom_headerssupport toOperationOptions(new field, setter, getter) and wired it intobuild_transport_requestinoperation_pipeline.rs. Custom headers may be removed in the future as we analyze which options are truly needed.Response Bridge (Driver → SDK)
The driver returns an untyped
CosmosResponse { body: Vec<u8>, headers: CosmosResponseHeaders, status: CosmosStatus }. The SDK returns a typedCosmosResponse<T>wrappingazure_core::Response<T>.Approach: Reconstruct the SDK response from driver parts:
The header conversion maps each typed
CosmosResponseHeadersfield back to its raw header name/value pair (reverse of the driver'sfrom_headers()parser).Caveat: Only headers that the driver explicitly parses are preserved (activity ID, request charge, session token, etag, continuation, item count, substatus). Any other server headers are lost. This covers all standard Cosmos response metadata. We will probably come back to this when we do the work on verifying the headers we want.
CosmosRequest → Optional
The SDK's
CosmosResponse<T>previously held the originalCosmosRequest— a gateway pipeline concept with no driver equivalent. The driver usesCosmosOperation+OperationOptionsinstead, which are consumed during execution.Decision: Made the
requestfieldOption<CosmosRequest>:Some(request).None.#[cfg(feature = "fault_injection")]and marked#[allow(dead_code)].Structural Changes
ContainerClient
Added two fields to
ContainerClientsoread_itemcan reach the driver at execution time:Previously, the driver was discarded after
new()andContainerReferencewas buried insideContainerConnection.driver_bridge Module
New private module at
src/driver_bridge.rscontaining:driver_response_to_cosmos_response<T>()— response conversionitem_options_to_operation_options()— options translationdriver_response_headers_to_headers()— converts the driver's typed response headers (e.g.,activity_id: Option<ActivityId>,request_charge: Option<RequestCharge>) into rawazure_core::Headerskey-value pairs for the SDK responseThis module is the shared foundation for all future operation cutover. When cutting over
create_item,delete_item, etc., they reuse the same bridge functions.Applying This Pattern to Other Operations
To cut over another item operation (e.g.,
create_item), follow this template:CosmosOperation::*factory method (e.g.,CosmosOperation::create_item(container_ref, pk))..with_body(bytes)on the operation.item_options_to_operation_options()fromdriver_bridge.rs. For write-specific options (e.g.,content_response_on_write_enabled), extend the bridge function.self.driver.execute_operation(operation, driver_options).await?.driver_response_to_cosmos_response(driver_response).The public method signature should not change.
Files Changed
azure_data_cosmos_driver/src/options/operation_options.rscustom_headersfield + setter/getterazure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rsazure_data_cosmos_driver/src/models/partition_key.rsPartitionKeyValuepubazure_data_cosmos_driver/src/models/mod.rsPartitionKeyValueazure_data_cosmos/src/driver_bridge.rsazure_data_cosmos/src/clients/container_client.rsdriver/container_reffields; rewroteread_itemazure_data_cosmos/src/models/cosmos_response.rsrequestfield optionalazure_data_cosmos/src/partition_key.rsinto_driver_partition_key()azure_data_cosmos/src/options/mod.rspub(crate)accessors for bridgeazure_data_cosmos/src/pipeline/mod.rsCosmosResponse::newcall siteazure_data_cosmos/src/lib.rsmod driver_bridgeOpen Items and Future Work
ItemOptions→OperationOptionstranslation may simplify or become unnecessary.PartitionKeytypes andinto_driver_partition_key()conversion should be eliminated once the types are unified.CosmosRequestremoval: Once all operations are routed through the driver, theOption<CosmosRequest>field onCosmosResponse<T>can be removed entirely.custom_headersreview: Thecustom_headersfield onOperationOptionswas added for feature parity. It may be removed as we analyze which options are truly needed at the driver level.create_item,delete_item,replace_item,upsert_item,patch_item, and query operations should follow the same pattern established here.Fault Injection Wiring
When cutting
read_itemover to the driver, the SDK's fault injection tests initially failed because the two execution paths (gateway and driver) have independent fault injection systems. This section documents how they were connected.Problem
The SDK and driver each have their own fault injection module (
azure_data_cosmos::fault_injectionandazure_data_cosmos_driver::fault_injection). They define parallel but separate types (FaultInjectionRule,FaultInjectionCondition,FaultInjectionResult, etc.) with identical variants but different Rust types. Prior to this work, only the gateway pipeline received fault injection rules — the driver was built without them.Solution: Rule Translation with Shared State
The bridge module (
driver_bridge.rs) includessdk_fi_rules_to_driver_fi_rules(), which translates SDK fault injection rules into driver fault injection rules. The translation covers:FaultOperationType— variant-by-variant match (identical variant names)FaultInjectionErrorType— variant-by-variant matchFaultInjectionCondition—RegionName→Region, operation type and container ID mapped directlyFaultInjectionResult—Duration→Option<Duration>, probability copiedstart_time: Instant→Option<Instant>,end_timeandhit_limitcopiedShared Mutable State
SDK
FaultInjectionRulehasenabled: Arc<AtomicBool>andhit_count: Arc<AtomicU32>that tests mutate at runtime (.disable(),.enable(),.hit_count()). The driver'sFaultInjectionRuleBuilderaccepts externalArcs viawith_shared_state(), so both the SDK gateway path and the driver path reference the same atomic state. This means:.disable()on the SDK rule also disables it in the driverWiring in
CosmosClientBuilderIn
CosmosClientBuilder::build():FaultInjectionClientBuilderis consumed for the gateway transport,rules()extracts a reference to the SDK rulessdk_fi_rules_to_driver_fi_rules()translates them to driver rules with shared stateCosmosDriverRuntimeBuilder::with_fault_injection_rules()fault_injectionCargo feature now forwards to the driver'sfault_injectionfeatureTest Patterns for Future Cutover
When cutting over additional operations, no additional fault injection wiring is needed — it's handled once at the
CosmosClientBuilderlevel. However, tests that assertrequest_url()need to handleNonefor driver-routed operations:custom_responseTranslationTranslation of
CustomResponse(synthetic HTTP responses) is not yet implemented. None of the current tests use custom responses forReadItemoperations. When needed, the bridge function should be extended to translateCustomResponsefields (status_code,headers,body).Consolidating to Driver Fault Injection After Cutover
The current dual-system architecture (SDK fault injection + driver fault injection + translation bridge) exists only because the cutover is incremental — some operations still go through the gateway while others go through the driver. Once all operations are routed through the driver:
Drop
azure_data_cosmos::fault_injection— the SDK's HTTP-client-level fault interception module becomes unreachable. Delete the entiresrc/fault_injection/directory.Re-export driver types — the SDK re-exports the driver's fault injection types directly:
Remove the translation layer —
sdk_fi_rules_to_driver_fi_rules()indriver_bridge.rsand theshared_enabled()/shared_hit_count()accessors on the SDK rule are no longer needed.Simplify
CosmosClientBuilder—with_fault_injection()acceptsVec<Arc<driver::FaultInjectionRule>>directly and passes them toCosmosDriverRuntimeBuilder::with_fault_injection_rules(). No translation, no cloning, no intermediary builder.Update tests — tests construct driver
FaultInjectionRuledirectly (same builders, same API) instead of SDK rules.At that point the SDK has no fault injection logic of its own — it's a pass-through to the driver, matching the overall "SDK as thin wrapper" goal. The driver is the single source of truth for all transport-related concerns including fault injection.