Cosmos: Microbenchmark and some optimizations#4159
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a Cosmos driver microbenchmark harness (no I/O via a mock transport) and applies several CPU-side optimizations in the driver hot path to reduce repeated parsing/allocations, aligning with the repo’s performance framework direction (Issues #3094 / #3095).
Changes:
- Add a new
azure_data_cosmos_benchmarkscrate (Criterion + optional pprof/dhat workflows) using an in-memory mock transport via a new internal driver feature flag. - Optimize the driver request path + signing-link computation by precomputing/borrowing paths (
ResourcePaths,Arc<str>container paths) and reusing parsed account endpoints. - Reduce header processing overhead by iterating headers once and carrying parsed
CosmosResponseHeadersthrough the pipeline to avoid redundant lookups/parsing.
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos_driver/src/testing.rs | Adds a feature-gated internal testing/mocking surface for benchmarks/harnesses. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/resource_reference.rs | Caches container paths using Arc<str> and switches path accessors to borrowed strings. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/mod.rs | Renames header-name module export and re-exports ResourcePaths internally. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_resource_reference.rs | Introduces ResourcePaths + compute_paths() to reduce duplicate allocations and adds tests. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_headers.rs | Consolidates header-name constants and optimizes CosmosResponseHeaders::from_headers. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs | Adds Deserialize for AccountEndpoint to parse once during account metadata load. |
| sdk/cosmos/azure_data_cosmos_driver/src/lib.rs | Exposes testing module behind __internal_mocking. |
| sdk/cosmos/azure_data_cosmos_driver/src/fault_injection/http_client.rs | Updates header handling to the new header-name constants approach. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/transport_pipeline.rs | Parses CosmosResponseHeaders once in the transport layer and passes them forward. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/http_client_factory.rs | Makes factory/config types public for feature-gated mocking API. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/cosmos_transport_client.rs | Makes request/response/transport traits public for feature-gated mocking API. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/authorization_policy.rs | Avoids signing-link allocations by carrying ResourcePaths into auth context. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/runtime.rs | Allows injecting a custom HTTP client factory under __internal_mocking. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs | Reuses pre-parsed regional endpoint URLs. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/retry_evaluation.rs | Adjusts transport result destructuring for new outcome shape. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs | Uses compute_paths() and consumes parsed cosmos headers from transport results. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/components.rs | Carries parsed CosmosResponseHeaders in transport outcomes; raw headers only on errors. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs | Uses pre-parsed AccountEndpoint for write-region selection and updates tests. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/account_metadata_cache.rs | Changes AccountRegion.database_account_endpoint from String to AccountEndpoint. |
| sdk/cosmos/azure_data_cosmos_driver/Cargo.toml | Adds the __internal_mocking feature flag. |
| sdk/cosmos/azure_data_cosmos_benchmarks/benches/point_read.rs | Adds Criterion point-read microbenchmark with optional pprof profiling. |
| sdk/cosmos/azure_data_cosmos_benchmarks/benches/heap_profile.rs | Adds dhat-based single-iteration heap profiling bench. |
| sdk/cosmos/azure_data_cosmos_benchmarks/benches/common.rs | Shared mock transport + driver setup/priming for benchmarks. |
| sdk/cosmos/azure_data_cosmos_benchmarks/README.md | Documents how to run benchmarks and generate profiles. |
| sdk/cosmos/azure_data_cosmos_benchmarks/Cargo.toml | New (non-published) benchmarks crate definition and dev-dependencies. |
| sdk/cosmos/azure_data_cosmos_benchmarks/.gitignore | Ignores dhat output JSON. |
| sdk/cosmos/azure_data_cosmos/src/models/cosmos_response.rs | Adds an internal constructor that accepts pre-parsed headers from the driver bridge. |
| sdk/cosmos/azure_data_cosmos/src/driver_bridge.rs | Avoids redundant header parsing by passing pre-parsed headers through. |
| Cargo.toml | Adds benchmarks crate to the workspace and adds workspace deps for dhat/pprof. |
| Cargo.lock | Locks new benchmark-related dependencies. |
Comments suppressed due to low confidence (1)
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs:449
normalizedclonesrequest_pathviato_string()in the common case where it already starts with '/'. Sincecompute_paths()always builds non-empty paths with a leading '/', this adds an extra allocation on the hot path. Consider passingrequest_pathdirectly toUrl::set_path()(or usingCow<'_, str>to allocate only when a leading '/' needs to be added).
let request_path = paths.request_path();
let normalized = if request_path.starts_with('/') {
request_path.to_string()
} else if request_path.is_empty() {
String::new()
} else {
format!("/{}", request_path)
};
base.set_path(&normalized);
heaths
left a comment
There was a problem hiding this comment.
Approved for files outside sdk/cosmos. I had some questions/(slight) concerns under sdk/cosmos.
0b99c11 to
207174e
Compare
Add azure_data_cosmos_benchmarks crate with criterion-based microbenchmarks for CosmosDriver::execute_operation. The benchmarks replace the reqwest transport with an in-memory mock via a new __internal_mocking feature flag on azure_data_cosmos_driver, so 99% of the driver pipeline is exercised without network I/O. Benchmarks: - benches/point_read.rs: latency benchmark (~16 µs/iter) with pprof flamegraph support (--profile-time <sec>) - benches/heap_profile.rs: dhat heap trace for a single warm iteration - benches/common.rs: shared mock transport and setup Driver changes (behind __internal_mocking feature): - Expose TransportClient, HttpClientFactory, and related types publicly via a new testing module - Add with_mock_http_client_factory() to CosmosDriverRuntimeBuilder Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an Arc pointer identity fast-path to LocationStateStore::sync_account_properties. When execute_operation fetches account metadata from the cache and the cache returns the same Arc it returned on the previous call, the entire sync — including prune_expired_unavailable_endpoints, the etag mutex, and the build_preferred_endpoints CAS loop — is skipped. Previously the etag fast-path only fired when the server supplied an etag. With no etag (e.g., test/emulator payloads), every call fell through to build_account_endpoint_state and allocated two new Arc<CosmosEndpointData> objects per request. The new last_synced_properties field stores a Mutex<Option<Arc<AccountProperties>>>. Arc::ptr_eq is used to compare pointers without touching the data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Headers::with_capacity(n) constructor to typespec headers and use it in build_transport_request with capacity 16. Cosmos requests insert ~12 headers per call; without pre-allocation hashbrown triggers 3 rehash allocations as it grows from 0 → 1 → 3 → 7 entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace format!() calls in generate_authorization and build_string_to_sign with String::with_capacity + push_str/write./InstallTool.sh --help url_encode now pre-allocates with the input length before collecting percent-encoded bytes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…me_based_path
ContainerReference already stores a pre-computed name_based_path Arc<str>
of the form "/dbs/{db}/colls/{coll}". name_path() now returns &str by
slicing off the leading '/', avoiding a format! allocation on every
set_session_token and get_session_token call.
The write path (set_session_token) still calls .to_owned() for the HashMap
insert key. The read path (get_session_token) is now allocation-free for
the name lookup, which is the hot path in steady-state operation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EndpointKey previously contained a String and was constructed via
TryFrom<&Url> using format!("{host}:{port}") once per transport pipeline
invocation (once per execute_operation call, not per retry).
Changes:
- EndpointKey now wraps Arc<str>; cloning is an atomic refcount increment
with no heap allocation.
- CosmosEndpointData caches the EndpointKey at construction time.
- CosmosEndpoint::endpoint_key() returns a cheap clone of the cached key.
- TransportPipelineContext gains an endpoint_key field so the transport
pipeline uses the pre-computed value instead of recomputing it.
- Re-export EndpointKey through the transport module so routing code
can reference it without accessing the private sharded_transport module.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…onflicts During cherry-pick of fix commits onto cosmos-benchmark-fixes, conflicts arose because the upstream release branch added new header constants (PRIORITY_LEVEL, THROUGHPUT_BUCKET) using the old static HeaderName approach while the fix commits switched to const &str with a unified header_names module. Resolves conflicts by: - Restoring split request_header_names / response_header_names modules (using const &str for performance, allowing duplicates across modules) - Updating operation_pipeline.rs import and removing spurious .clone() calls - Fixing fault_injection/http_client.rs to use response_header_names::SUBSTATUS - Fixing cosmos_driver.rs to use AccountEndpoint::url() instead of Url::parse(&str) since database_account_endpoint is now AccountEndpoint after 07940a9 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix doc comment on request_header_names: lowercase is required by azure_core HeaderName, not by HTTP/1.1 (which is case-insensitive) - Add real pointer-range assertion in assert_compute_paths_consistent to verify signing_link is a zero-copy sub-slice of request_path - Restore pub on transport types (HttpRequest, HttpResponse, etc.) — they must be pub for re-export via the testing module under __internal_mocking; keep non-exported fields pub(crate) - Remove unused response_header_names re-export from models/mod.rs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
123a837 to
da55dfc
Compare
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
cspell - either we disable it everywhere (except maybe md files) or we enforce cspell and whitelist exceptions - but disabling cspell for entire files defeats the purpose of having it (silently accidental regressions in those files will happen)
8f027b8
into
Azure:release/azure_data_cosmos-previews
After PR #4159 refactored from_headers() to use a match statement, the ITEM_LSN constant and its match arm were missing. Add them so item_lsn is actually parsed. Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-rust/sessions/92d9a4cd-3b32-465e-aa21-af8ef6c02508 Co-authored-by: simorenoh <30335873+simorenoh@users.noreply.github.com>
Switch request_header_names entries to &str consts to match the new upstream pattern from PR Azure#4159, while preserving the OFFER_THROUGHPUT and OFFER_AUTOPILOT_SETTINGS additions from this PR. Drops the .clone() calls that were only needed for the prior HeaderName style. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I wrote up a quick microbenchmark of the driver's
execute_operationcode path, which excludes the I/O entirely by using a fixed Mock Transport (gated behind an internal feature flag). Using that, I was able to get a 40+% improvement in performance (again, these times exclude I/O, but should reflect CPU usage reductions).Before:
After
Obviously, these are tiny numbers, but I suspect they add up under high concurrency. We won't know for sure until this is run in the benchmarks. They should be safe, so I'm game to merge and see, but we could also try running the tip of this branch in the perf infra (cc @tvaron3 ) to get real results.
The optimizations I worked on were predominantly to reduce duplicate parsing and
HashMap::insert/HashMap::getcalls. Rather than callingHashMap::get_optional_str(...)for each header when materializing aCosmosResponseHeaders, we now iterate the map. Rather than re-parsingurl::Url, theAccountRegiontype now directly deserializes the gateway response into anAccountEndpoint, allowing us to reuse the sameurl::Urlfor each request. Rather than constantlyformat!-ing theCosmosResourceReference, we now produce a singleResourcePathsfrom it and allow getting slices from it for the various purposes.The benchmarks include instructions for running both profiled and unprofiled runs, and criterion does a great job of tracking results and showing you improvements over time. Having benchmark infra like this in place will be useful as we keep going.