Skip to content

[Cosmos] Port read-item to use Driver#4053

Merged
simorenoh merged 23 commits into
release/azure_data_cosmos-previewsfrom
read-item-update
Mar 31, 2026
Merged

[Cosmos] Port read-item to use Driver#4053
simorenoh merged 23 commits into
release/azure_data_cosmos-previewsfrom
read-item-update

Conversation

@simorenoh
Copy link
Copy Markdown
Member

@simorenoh simorenoh commented Mar 26, 2026

This PR will contain the work needed in order to make the changes to port over an initial method (read_item) to use the underlying driver as the connection. For now, I am sharing a spec of the proposed changes, in the hopes that this same spec will work to migrate all other remaining methods after we verify this one works.

The spec file can be found in the PR to facilitate review, but will also be the description below. Actual code implementation to follow.

SDK-to-Driver Cutover: Design Specification

Overview

This document describes the design for routing azure_data_cosmos SDK operations through the azure_data_cosmos_driver execution engine, replacing the legacy gateway pipeline path. The first operation cut over is ContainerClient::read_item, which serves as the reference pattern for all subsequent operations.

Context

Prior to this work, the Cosmos SDK had two separate execution paths:

  • Gateway pipeline (azure_data_cosmos): The SDK handled auth, routing, retries, and request construction via CosmosRequestGatewayPipeline → HTTP.
  • Driver (azure_data_cosmos_driver): A newer execution engine with its own transport, routing, and operation model (CosmosOperation + OperationOptions). Previously used only in driver-level tests.

PR #4005 bridged the two worlds by having ContainerClient::new() call driver.resolve_container() for eager metadata resolution. This PR takes the next step: routing the first data operation through the driver.

Goal

Make the SDK client a thin wrapper over the driver. The SDK translates public-facing types into driver concepts, delegates execution, and translates the response back. All real work (auth, routing, retries, transport) happens inside driver.execute_operation().

Architecture

Data Flow

User calls:     container_client.read_item(pk, id, options)
                              │
                    ┌─────────▼────────────┐
                    │  SDK ContainerClient │
                    └─────────┬────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
  PartitionKey           ItemOptions        ContainerRef
  (SDK type)             (SDK type)        (driver type,
          │                   │           stored on client)
          │                   │                   │
          ▼                   ▼                   ▼
  into_driver_pk()   item_options_to_       ItemReference::
          │           operation_options()    from_name()
          │                   │                   │
          └───────────────────┼───────────────────┘
                              │
                    ┌─────────▼──────────┐
                    │  CosmosOperation:: │
                    │    read_item()     │
                    └─────────┬──────────┘
                              │
                    ┌─────────▼───────────┐
                    │  driver.execute_    │
                    │  operation(op, opts)│
                    │                     │
                    │  (auth, routing,    │
                    │   retries, HTTP)    │
                    └─────────┬───────────┘
                              │
                    ┌─────────▼───────────┐
                    │  driver_response_   │
                    │  to_cosmos_response │
                    └─────────┬───────────┘
                              │
                    ┌─────────▼───────────┐
                    │  CosmosResponse<T>  │
                    │  (SDK public type)  │
                    └─────────────────────┘

Key Principle

The SDK's public API does not change. read_item retains the same signature, return type, and observable behavior. This is a pure internal refactor.

Design Decision: Driver as Required Infrastructure

An alternative approach was explored where the driver is optional — stored as Option<Arc<CosmosDriver>> on CosmosClient, DatabaseClient, and ContainerClient. In that model, each operation checks at runtime whether a driver is available: if so, it takes the driver path; otherwise, it falls back to the legacy gateway pipeline. Container metadata resolution is also optional and failure is silently ignored.

We chose not to take that approach, since we want to verify the behavior of the driver being used only and this single method will serve as the test. In this design, the driver is required:

  • CosmosClient stores Arc<CosmosDriver> (not Option).
  • ContainerClient::new() eagerly resolves container metadata via the driver and returns Result — if resolution fails, the client cannot be created.
  • Operations have a single codepath through the driver, with no gateway fallback.

Rationale

The purpose of this cutover is to validate that the driver can fully replace the gateway pipeline for each operation. A fallback path undermines that goal:

  • Testability: If the driver path can silently fall back to the gateway, we can't be 100% sure that the driver path is exercised in production or tests. Failures would be hidden rather than surfaced.
  • Correctness: A dual-codepath design requires maintaining behavioral parity between two implementations indefinitely. A single path is easier to reason about, test, and debug.
  • Options fidelity: A fallback path tempts skipping the options translation (e.g., passing empty OperationOptions on the driver path), which silently drops user-configured session tokens, etags, and excluded regions.
  • Response fidelity: A minimal fallback implementation may skip reconstructing response headers from the driver's typed response, causing callers to get None for request_charge(), session_token(), and etag().

The cutover is intentionally incremental — one operation at a time. Operations that haven't been cut over yet continue using the gateway pipeline naturally (they don't call the driver). This gives us the gradual rollout benefit without the complexity of runtime branching within a single operation.

Type Translation Decisions

PartitionKey (SDK → Driver)

The SDK and driver define separate PartitionKey types with identical structure but in different crates. Both represent a JSON array of typed values (string, number, bool, null).

Approach: Added into_driver_partition_key() on the SDK's PartitionKey that maps each InnerPartitionKeyValue variant to the driver's PartitionKeyValue.

Driver change required: Made PartitionKeyValue pub (was pub(crate)) so the SDK crate can construct Vec<PartitionKeyValue> for the conversion.

Future consideration: Once Ashley's options alignment work unifies these types, this conversion can be eliminated, and we can just use the Driver's definitions the way we did with the ContainerReference.

// SDK partition_key.rs
pub(crate) fn into_driver_partition_key(self) -> driver::PartitionKey {
    let driver_values: Vec<DriverPKV> = self.0.into_iter()
        .map(|v| match v.0 {
            InnerPartitionKeyValue::String(s) => DriverPKV::from(s),
            InnerPartitionKeyValue::Number(n) => DriverPKV::from(n),
            InnerPartitionKeyValue::Bool(b) => DriverPKV::from(b),
            InnerPartitionKeyValue::Null => DriverPKV::from(Option::<String>::None),
            // ...
        })
        .collect();
    DriverPK::from(driver_values)
}

ItemOptions → OperationOptions

The SDK's ItemOptions (item-scoped request options) maps to the driver's OperationOptions field-by-field. The types in each field differ between crates, so values are bridged via their string representations.

SDK ItemOptions field Driver OperationOptions Conversion
session_token: Option<SessionToken> .with_session_token() DriverSessionToken::new(token.to_string())
if_match_etag: Option<Etag> .with_etag_condition() Precondition::if_match(ETag::new(etag.to_string()))
custom_headers: HashMap<...> .with_custom_headers() Passed through directly (types are the same)
excluded_regions: Option<Vec<RegionName>> .with_excluded_regions() Region::new(name.to_string()) for each
content_response_on_write_enabled: bool Ignored for reads Driver always returns body for point reads

Driver change required: Added custom_headers support to OperationOptions (new field, setter, getter) and wired it into build_transport_request in operation_pipeline.rs. Custom headers may be removed in the future as we analyze which options are truly needed.

Response Bridge (Driver → SDK)

The driver returns an untyped CosmosResponse { body: Vec<u8>, headers: CosmosResponseHeaders, status: CosmosStatus }. The SDK returns a typed CosmosResponse<T> wrapping azure_core::Response<T>.

Approach: Reconstruct the SDK response from driver parts:

pub(crate) fn driver_response_to_cosmos_response<T>(
    driver_response: DriverResponse,
) -> CosmosResponse<T> {
    let status_code = driver_response.status().status_code();
    let headers = cosmos_response_headers_to_headers(driver_response.headers());
    let body = driver_response.into_body();

    let raw = RawResponse::from_bytes(status_code, headers, Bytes::from(body));
    let typed: Response<T> = raw.into();
    CosmosResponse::new(typed, None)
}

The header conversion maps each typed CosmosResponseHeaders field back to its raw header name/value pair (reverse of the driver's from_headers() parser).

Caveat: Only headers that the driver explicitly parses are preserved (activity ID, request charge, session token, etag, continuation, item count, substatus). Any other server headers are lost. This covers all standard Cosmos response metadata. We will probably come back to this when we do the work on verifying the headers we want.

CosmosRequest → Optional

The SDK's CosmosResponse<T> previously held the original CosmosRequest — a gateway pipeline concept with no driver equivalent. The driver uses CosmosOperation + OperationOptions instead, which are consumed during execution.

Decision: Made the request field Option<CosmosRequest>:

  • Gateway-routed operations (all methods not yet cut over) continue setting Some(request).
  • Driver-routed operations set None.
  • The field is only accessed behind #[cfg(feature = "fault_injection")] and marked #[allow(dead_code)].
  • A TODO comment marks it for removal once all operations are on the driver.

Structural Changes

ContainerClient

Added two fields to ContainerClient so read_item can reach the driver at execution time:

pub struct ContainerClient {
    // ... existing fields ...
    driver: Arc<CosmosDriver>,         // retained from new()
    container_ref: ContainerReference,  // cloned before passing to ContainerConnection
}

Previously, the driver was discarded after new() and ContainerReference was buried inside ContainerConnection.

driver_bridge Module

New private module at src/driver_bridge.rs containing:

  • driver_response_to_cosmos_response<T>() — response conversion
  • item_options_to_operation_options() — options translation
  • driver_response_headers_to_headers() — converts the driver's typed response headers (e.g., activity_id: Option<ActivityId>, request_charge: Option<RequestCharge>) into raw azure_core::Headers key-value pairs for the SDK response

This module is the shared foundation for all future operation cutover. When cutting over create_item, delete_item, etc., they reuse the same bridge functions.

Applying This Pattern to Other Operations

To cut over another item operation (e.g., create_item), follow this template:

  1. Build the operation: Use the appropriate CosmosOperation::* factory method (e.g., CosmosOperation::create_item(container_ref, pk)).
  2. Attach the body: For write operations, serialize the item to bytes and call .with_body(bytes) on the operation.
  3. Translate options: Reuse item_options_to_operation_options() from driver_bridge.rs. For write-specific options (e.g., content_response_on_write_enabled), extend the bridge function.
  4. Execute: Call self.driver.execute_operation(operation, driver_options).await?.
  5. Bridge response: Reuse driver_response_to_cosmos_response(driver_response).

The public method signature should not change.

Files Changed

File Change
azure_data_cosmos_driver/src/options/operation_options.rs Added custom_headers field + setter/getter
azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs Wired custom headers into request construction
azure_data_cosmos_driver/src/models/partition_key.rs Made PartitionKeyValue pub
azure_data_cosmos_driver/src/models/mod.rs Re-exported PartitionKeyValue
azure_data_cosmos/src/driver_bridge.rs New — shared conversion module
azure_data_cosmos/src/clients/container_client.rs Added driver/container_ref fields; rewrote read_item
azure_data_cosmos/src/models/cosmos_response.rs Made request field optional
azure_data_cosmos/src/partition_key.rs Added into_driver_partition_key()
azure_data_cosmos/src/options/mod.rs Added pub(crate) accessors for bridge
azure_data_cosmos/src/pipeline/mod.rs Updated CosmosResponse::new call site
azure_data_cosmos/src/lib.rs Registered mod driver_bridge

Open Items and Future Work

  • Options alignment: Ashley is working on aligning SDK options with the driver's options model. Once complete, the ItemOptionsOperationOptions translation may simplify or become unnecessary.
  • PartitionKey unification: The dual PartitionKey types and into_driver_partition_key() conversion should be eliminated once the types are unified.
  • CosmosRequest removal: Once all operations are routed through the driver, the Option<CosmosRequest> field on CosmosResponse<T> can be removed entirely.
  • custom_headers review: The custom_headers field on OperationOptions was added for feature parity. It may be removed as we analyze which options are truly needed at the driver level.
  • Remaining operations: create_item, delete_item, replace_item, upsert_item, patch_item, and query operations should follow the same pattern established here.

Fault Injection Wiring

When cutting read_item over to the driver, the SDK's fault injection tests initially failed because the two execution paths (gateway and driver) have independent fault injection systems. This section documents how they were connected.

Problem

The SDK and driver each have their own fault injection module (azure_data_cosmos::fault_injection and azure_data_cosmos_driver::fault_injection). They define parallel but separate types (FaultInjectionRule, FaultInjectionCondition, FaultInjectionResult, etc.) with identical variants but different Rust types. Prior to this work, only the gateway pipeline received fault injection rules — the driver was built without them.

Solution: Rule Translation with Shared State

The bridge module (driver_bridge.rs) includes sdk_fi_rules_to_driver_fi_rules(), which translates SDK fault injection rules into driver fault injection rules. The translation covers:

  • FaultOperationType — variant-by-variant match (identical variant names)
  • FaultInjectionErrorType — variant-by-variant match
  • FaultInjectionConditionRegionNameRegion, operation type and container ID mapped directly
  • FaultInjectionResultDurationOption<Duration>, probability copied
  • Timing fields — start_time: InstantOption<Instant>, end_time and hit_limit copied

Shared Mutable State

SDK FaultInjectionRule has enabled: Arc<AtomicBool> and hit_count: Arc<AtomicU32> that tests mutate at runtime (.disable(), .enable(), .hit_count()). The driver's FaultInjectionRuleBuilder accepts external Arcs via with_shared_state(), so both the SDK gateway path and the driver path reference the same atomic state. This means:

  • Calling .disable() on the SDK rule also disables it in the driver
  • Hit counts are shared — both paths increment the same counter
  • Tests that toggle rules or assert hit counts work correctly across both paths

Wiring in CosmosClientBuilder

In CosmosClientBuilder::build():

  1. Before the FaultInjectionClientBuilder is consumed for the gateway transport, rules() extracts a reference to the SDK rules
  2. sdk_fi_rules_to_driver_fi_rules() translates them to driver rules with shared state
  3. The translated rules are passed to CosmosDriverRuntimeBuilder::with_fault_injection_rules()
  4. The SDK's fault_injection Cargo feature now forwards to the driver's fault_injection feature

Test Patterns for Future Cutover

When cutting over additional operations, no additional fault injection wiring is needed — it's handled once at the CosmosClientBuilder level. However, tests that assert request_url() need to handle None for driver-routed operations:

// Gateway-routed operations return Some(url)
// Driver-routed operations return None
if let Some(url) = response.request_url() {
    assert_eq!(url.host_str().unwrap(), expected_endpoint);
}

custom_response Translation

Translation of CustomResponse (synthetic HTTP responses) is not yet implemented. None of the current tests use custom responses for ReadItem operations. When needed, the bridge function should be extended to translate CustomResponse fields (status_code, headers, body).

Consolidating to Driver Fault Injection After Cutover

The current dual-system architecture (SDK fault injection + driver fault injection + translation bridge) exists only because the cutover is incremental — some operations still go through the gateway while others go through the driver. Once all operations are routed through the driver:

  1. Drop azure_data_cosmos::fault_injection — the SDK's HTTP-client-level fault interception module becomes unreachable. Delete the entire src/fault_injection/ directory.

  2. Re-export driver types — the SDK re-exports the driver's fault injection types directly:

    #[cfg(feature = "fault_injection")]
    pub use azure_data_cosmos_driver::fault_injection;
  3. Remove the translation layersdk_fi_rules_to_driver_fi_rules() in driver_bridge.rs and the shared_enabled()/shared_hit_count() accessors on the SDK rule are no longer needed.

  4. Simplify CosmosClientBuilderwith_fault_injection() accepts Vec<Arc<driver::FaultInjectionRule>> directly and passes them to CosmosDriverRuntimeBuilder::with_fault_injection_rules(). No translation, no cloning, no intermediary builder.

  5. Update tests — tests construct driver FaultInjectionRule directly (same builders, same API) instead of SDK rules.

At that point the SDK has no fault injection logic of its own — it's a pass-through to the driver, matching the overall "SDK as thin wrapper" goal. The driver is the single source of truth for all transport-related concerns including fault injection.

@simorenoh simorenoh changed the title [Cosmos] Port read-item to use Driver [Cosmos] Spec: Port read-item to use Driver Mar 26, 2026
@github-actions github-actions Bot added the Cosmos The azure_cosmos crate label Mar 26, 2026
@simorenoh simorenoh linked an issue Mar 26, 2026 that may be closed by this pull request
@simorenoh simorenoh added the Client This issue points to a problem in the data-plane of the library. label Mar 26, 2026
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation Bot moved this from Todo to Approved in CosmosDB Go/Rust Crew Mar 26, 2026
Comment thread sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md
@simorenoh simorenoh marked this pull request as ready for review March 27, 2026 17:22
@simorenoh simorenoh requested a review from a team as a code owner March 27, 2026 17:22
Copilot AI review requested due to automatic review settings March 27, 2026 17:22
@simorenoh simorenoh changed the title [Cosmos] Spec: Port read-item to use Driver [Cosmos] Port read-item to use Driver Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports the first SDK data-plane operation (ContainerClient::read_item) to execute through azure_data_cosmos_driver, establishing a reusable “SDK ↔ driver” translation layer intended to be replicated across the remaining operations.

Changes:

  • Added an SDK-side driver_bridge module to translate ItemOptions → OperationOptions and driver responses/headers → SDK CosmosResponse<T>.
  • Reworked ContainerClient::read_item to build a driver ItemReference + CosmosOperation and execute via CosmosDriver.
  • Extended the driver with OperationOptions::custom_headers and exported PartitionKeyValue to enable SDK→driver PK conversion.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/cosmos/azure_data_cosmos_driver/src/options/operation_options.rs Adds custom_headers to driver OperationOptions.
sdk/cosmos/azure_data_cosmos_driver/src/models/partition_key.rs Makes PartitionKeyValue public for cross-crate construction via From impls.
sdk/cosmos/azure_data_cosmos_driver/src/models/mod.rs Re-exports PartitionKeyValue publicly from models.
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs Wires OperationOptions into transport request construction and applies custom_headers.
sdk/cosmos/azure_data_cosmos/src/pipeline/mod.rs Updates gateway path to construct CosmosResponse with Some(request).
sdk/cosmos/azure_data_cosmos/src/partition_key.rs Adds SDK→driver partition key conversion helper.
sdk/cosmos/azure_data_cosmos/src/options/mod.rs Adds ItemOptions accessors used by the driver bridge.
sdk/cosmos/azure_data_cosmos/src/models/cosmos_response.rs Makes stored request optional and updates fault-injection accessors accordingly.
sdk/cosmos/azure_data_cosmos/src/lib.rs Registers the new driver_bridge module.
sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs New module implementing options + response/header translation.
sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs Stores driver/container reference on the client and routes read_item through the driver.
sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md Adds the design specification documenting the cutover approach.

Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos/src/partition_key.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos/src/models/cosmos_response.rs
@azure-pipelines
Copy link
Copy Markdown

Command 'rust' is not supported by Azure Pipelines.

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@analogrelay
Copy link
Copy Markdown
Member

Phew, I actually didn't want to run the pipeline right now. @simorenoh when this is passing PR checks, can you kick off the live tests? We should make sure those pass before merging.

Copy link
Copy Markdown
Member

@analogrelay analogrelay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor stuff that Copilot already flagged and commentary on the future (as I tend to do ;))

Comment thread sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs
Comment thread sdk/cosmos/azure_data_cosmos/src/models/cosmos_response.rs
Comment thread sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs
Comment thread sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs
@github-project-automation github-project-automation Bot moved this from Approved to Changes Requested in CosmosDB Go/Rust Crew Mar 27, 2026
@simorenoh simorenoh requested a review from tvaron3 March 27, 2026 19:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 27, 2026

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure_data_cosmos_driver
azure_data_cosmos

@simorenoh
Copy link
Copy Markdown
Member Author

/azp run rust - cosmos - weekly

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Copy Markdown
Member Author

/azp run rust - cosmos - weekly

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Copy Markdown
Member Author

/azp run rust - cosmos - weekly

@Azure Azure deleted a comment from azure-pipelines Bot Mar 31, 2026
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Azure Azure deleted a comment from azure-pipelines Bot Mar 31, 2026
@Azure Azure deleted a comment from azure-pipelines Bot Mar 31, 2026
@simorenoh simorenoh merged commit 71668fe into release/azure_data_cosmos-previews Mar 31, 2026
38 checks passed
@simorenoh simorenoh deleted the read-item-update branch March 31, 2026 03:05
@github-project-automation github-project-automation Bot moved this from Changes Requested to Done in CosmosDB Go/Rust Crew Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Client This issue points to a problem in the data-plane of the library. Cosmos The azure_cosmos crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Cosmos: Use Driver to implement Read Item

4 participants