Slim PartitionKeyRange cache#4393
Merged
tvaron3 merged 6 commits intoMay 13, 2026
Merged
Conversation
Mirror Azure SDK for Python PR #46297 (memory optimization #2): strip 8 unused metadata fields from PartitionKeyRange in azure_data_cosmos_driver, keeping only id/min_inclusive/max_exclusive/ status/throughput_fraction/parents. Stripped fields (resource_id, self_link, etag, timestamp, rid_prefix, target_throughput, lsn, owned_archival_pk_range_ids) are silently ignored by serde on deserialization, so the wire contract is unchanged. Per-entry size drops from ~232 to 112 bytes on 64-bit. Relax PartialEq/Hash to the routing-relevant identity tuple (id, min_inclusive, max_exclusive), matching the Java SDK's equals/ hashCode semantics. Internal callers never used PartitionKeyRange as a hash-map key. Add criterion bench (pkrange_cache) for cached and uncached resolve_partition_key_range_id paths at n=1/10/100. The bench mock transport now honors If-None-Match / 304 to terminate the cache refresh loop without re-merging the same payload. Add cached_size_stays_small regression test (cap 120 + exact pin 112 on 64-bit) and try_combine_online_to_offline_recomputes_highest_ non_offline plus the recovery flip to lock in routing semantics across the field reduction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per scope refinement, this PR keeps only the PartitionKeyRange slim-down and the relaxed PartialEq/Hash. The criterion bench (`pkrange_cache`) and the bench-only mock 304/If-None-Match plumbing are reverted out of this PR; they can be reintroduced separately if/when we want a tracked perf baseline. Reverts: - benches/pkrange_cache.rs (deleted) - src/lib.rs: PKRANGES_ETAG, build_pkranges_payload, mock 304 dispatch, MockHttpClientFactory pkranges plumbing, helper self-test - Cargo.toml: [[bench]] pkrange_cache entry, serde_json dev-dep - .cspell.json: bench-only words (benchcontainer, benchdb) - Cargo.lock: regenerated (drops serde_json from azure_data_cosmos_benchmarks dev-deps) The driver-side changes (slim struct, relaxed equality, regression tests in container_routing_map and partition_key_range modules) are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PartialEq / Hash relaxation bullet previously cited the Java SDK as the parity reference. Remove that cross-SDK note; the bullet now just describes the new behavior in this crate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build 6287970 (Analyze stage) flagged 'namedtuple' as an unknown word in two places — the CHANGELOG bullet that references Python's slim namedtuple, and the throughput_fraction doc comment that does the same. Add the term to sdk/cosmos/.cspell.json (alphabetically, matching the cosmos convention of central allowlist over inline disable directives). Verified locally: cspell reports 0 issues on both files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewords the throughput_fraction doc comment to describe what the field is and why it stays on the slim struct, without naming the Python SDK or its namedtuple. Drops the same parity sentence from the slim-struct CHANGELOG bullet. Since 'namedtuple' no longer appears in tracked source, revert its addition to sdk/cosmos/.cspell.json (added in the previous commit to clear the Analyze stage). cspell is clean against the new text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR slims the Cosmos driver PartitionKeyRange model used by the routing-map cache and updates related routing-map tests and release notes. The benchmark described in the PR metadata was not present in the reviewed repository snapshot.
Changes:
- Removed unused metadata fields from
PartitionKeyRangeand relaxed equality/hash semantics. - Added tests for deserializing service payloads with stripped metadata and preserving routing-map status behavior.
- Updated the driver changelog with breaking-change notes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sdk/cosmos/azure_data_cosmos_driver/src/models/partition_key_range.rs |
Slims PartitionKeyRange, updates trait implementations, and adds size/deserialization tests. |
sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/container_routing_map.rs |
Updates test helpers and adds regression coverage for offline/online status recomputation. |
sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md |
Documents the PartitionKeyRange breaking changes. |
Addresses Copilot reviewer feedback on PR Azure#4393: - Merge the two slim/relax bullets into one Breaking Changes entry, per the cosmos changelog instructions which require a single entry per PR. - Reword the entry so it accurately distinguishes the four fields the routing layer consults (id, min_inclusive, max_exclusive, status) from the two fields kept on the cached representation for downstream consumers (throughput_fraction, parents). The previous wording said the struct keeps 'only the fields the routing-map cache reads', which was inaccurate since throughput_fraction is consumer-retained. - Add the PR link suffix matching the rest of the file's style. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e708bc0
into
Azure:release/azure_data_cosmos-previews
12 checks passed
tvaron3
added a commit
to tvaron3/azure-sdk-for-rust
that referenced
this pull request
May 18, 2026
**SDK fault-injection: expose driver type directly**
- Delete the SDK-side `FaultInjectionClientBuilder` wrapper (was a thin rules container after the earlier cleanup). `CosmosClientBuilder::with_fault_injection` now accepts the driver's `Vec<Arc<FaultInjectionRule>>` directly. Users build rules with the re-exported driver `FaultInjectionRuleBuilder` and pass the vector — no SDK-side intermediate type.
- Migrate every test consumer (`cosmos_fault_injection.rs`, `cosmos_multi_write_fault_injection.rs`, `cosmos_multi_write_retry_policies.rs`) plus the test framework (`TestOptions::fault_injection_rules`, `with_fault_injection_rules`, `TestClient::from_env_with_fault_rules`).
- Update module-level doc + the SDK-to-driver cutover doc to describe the new API.
- Update CHANGELOG entry.
**Pipeline matrix: fold Gateway 2.0 entries into live-platform-matrix**
- Delete `sdk/cosmos/live-gateway20-matrix.json` and the second `LiveTestMatrixConfigs` entry in `sdk/cosmos/ci.yml`. The two G2 settings (`Session SingleRegion Gateway20` and `Session MultiRegion Gateway20`) move into the existing matrix's `include` array, scoped to ubuntu only so they don't cross-multiply with the existing `Account Settings` × {ubuntu, windows, macos} grid (G2 tests are Linux-only). The pre-provisioned G2 account is still consumed via the `AZURE_COSMOS_GW20_*` env vars wired at the job level.
**Merge upstream/release/azure_data_cosmos-previews**
- Pulled in: `Azure#4393` (slim PartitionKeyRange cache), `Azure#4389` (hub-region-processing header), `Azure#4312` (local query-plan generator scaffolding).
- Two conflicts, both trivial: CHANGELOG (G2 entry vs new upstream entries, both kept), `operation_pipeline.rs` (upstream switched to `pipeline_type.is_data_plane()`; G2 work added `account_name.is_some()` second arg to `resolve_endpoint` — kept both).
- Test helper `build_minimal_transport_request` from upstream needed the G2 PR's new `effective_consistency` field on `TransportRequestContext`; set to `DefaultConsistencyLevel::Session`.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kundadebdatta
added a commit
that referenced
this pull request
May 18, 2026
…ation (#4419) ### What this PR does Documentation-only port. Adds a single new file: - `sdk/cosmos/azure_data_cosmos_benchmarks/docs/PPCB_MEMORY_ANALYSIS.md` (+1,571 lines) No source, manifest, test, or CI changes. No behavioral impact on any shipped crate. ### Why The Per-Partition Circuit Breaker (PPCB) subsystem in `azure_data_cosmos_driver` carries lazy routing state whose worst-case footprint scales with `partition_count × failed_regions_per_partition`. Large-container customers (~80k physical partitions) have asked us for concrete memory-cost guidance. This document is the consolidated answer, captured under `azure_data_cosmos_benchmarks/docs` so it lives next to the harness that produced the numbers. It also documents an already-applied v2 optimization (`SmallVec<[CosmosEndpoint; 4]>` with the `union` feature replacing `HashSet<CosmosEndpoint>`, plus `CompactString` for `PartitionKeyRangeId`) and validates it against both a synthesized harness and a real Cosmos DB INT account. ### Document structure | Section | Content | |---|---| | §1–§9 | Methodology, harness, workload, headline numbers, per-allocation decomposition, per-entry cost model for the v1 (pre-optimization) driver. | | §10–§14 | Findings, recommendations, risk/operational implications, caveats, reproduction steps. | | Appendix A/B | Raw DHAT stack frame table + glossary. | | **§15** | **Verified v2 optimization results** — code changes applied, type-size impact, DHAT before/after, re-projected scaling. | | **§16** | **Real-account validation (v2)** — N = 115 partitions on a real Cosmos INT account, cross-checked against §15 projections. | | **§17** | **Real-account v1 vs v2 comparison** — apples-to-apples Δ on the same INT account. | | **§18** | **Simulated v1 vs v2 at 1k and 10k partitions** — closed-form model derived from §15/§17 per-entry slopes; predictions for fleet-scale N. | ### Headline numbers documented **v1 (HashSet-based) at 80k partitions, fully tripped, 2 failed regions/partition:** | Dimension | PPCB Disabled | PPCB Enabled | Delta | |---|---:|---:|---:| | Peak live heap | 1,380 B | 24,604,302 B (≈ 23.46 MiB) | +24,602,922 B | | Peak live blocks | 13 | 160,014 | +160,001 | | Bytes / partition entry | — | ~308 B | — | | Heap blocks / partition entry | — | 2 | — | | Leaks | 0 | 0 | — | **v2 (`SmallVec` + `CompactString`) impact:** - **Block-count pressure at 80k partitions: 160,014 → ~14 (−99.99 %)** — the dominant operational win (eliminates ~160k allocator round-trips on partition-event storms). - **Peak heap at 80k partitions: ~23.5 MiB → ~17 MiB (−28 %)** — also drops the single contiguous 20 MiB `HashMap` backing allocation, reducing fragmentation risk. - **`PartitionFailoverEntry` shrank 8 B** (unexpected secondary win from `SmallVec`'s `union` feature reusing the discriminant niche). - Confirmed on a real Cosmos INT account at N = 115 (§16/§17) and modeled at N = 1k / 10k (§18). ### Reviewer guidance - **Pure doc PR** — no diff against `Cargo.toml`, `src/`, `tests/`, or CI. - The file is 1,571 lines but is fully self-contained; reviewers don't need to run anything. - If you want to reproduce, §14 (synthesized harness), §16.10 / §17.7 (real account), and §18.8 (simulated 1k/10k) each have copy-paste reproduction blocks. - The code changes the document refers to (v2: `SmallVec` + `CompactString`) are **already shipped** on `release/azure_data_cosmos-previews` (see PR #4393 *Slim PartitionKeyRange cache* in the immediate history) — this PR only adds the report that motivated and validated them. ### Validation - `git status` clean after commit (only pre-existing untracked `sdk/cosmos/.vscode/` remains, untouched). - Branch is exactly 1 commit ahead of `origin/release/azure_data_cosmos-previews` (`962c1f7b7`). - No build/test/clippy/fmt checks needed (Markdown-only change in a `docs/` folder). ### Out of scope - Any further memory work beyond `SmallVec` + `CompactString` (e.g., §11.3 *boxing the HashMap value* is explicitly deferred). - Driver source changes (the v2 optimizations are already merged separately).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slims the cached
PartitionKeyRangefrom 14 fields to 6, mirroring memory optimization #2 from Azure SDK for Python PR #46297. The wire contract is unchanged — the dropped JSON keys are silently ignored by serde on deserialization.Motivation
The routing-map cache holds one
PartitionKeyRangeper(container, range)pair. With many containers — or severalCosmosClientinstances against the same account — that footprint dominates the per-container heap. The Cosmos service returns ~14 fields on/pkranges, but the routing layer reads only a handful.This PR slims the cached struct from 14 fields (~232 B) to 6 fields (112 B on 64-bit).
Changes
azure_data_cosmos_driverPartitionKeyRangeslimmed: the cached struct now retains the four fields the routing layer consults (id,min_inclusive,max_exclusive,status) plusthroughput_fractionandparents, kept for downstream consumers that read them directly. Dropped:resource_id,self_link,etag,timestamp,rid_prefix,target_throughput,lsn,owned_archival_pk_range_ids.statusis the only retained field beyond the geometry triple consulted on the routing hot path:validate_and_build_indexreads it to computehighest_non_offline_pk_range_idfor split detection.throughput_fractionandparentsare retained on the cached representation for downstream consumers; the routing layer itself does not consult them.PartialEq/Hashrelaxed to the routing-relevant identity tuple(id, min_inclusive, max_exclusive)—_ridis no longer in the comparison key. A grep across the workspace confirmedPartitionKeyRangeis never used as aHashMap/HashSet/BTreeMap/BTreeSetkey in this crate.Tests
cached_size_stays_small— assertssize_of::<PartitionKeyRange>() <= 120 B(re-bloat tripwire) AND== 112 Bon 64-bit (silent layout-regression tripwire).deserialization_ignores_stripped_metadata_fields— asserts that a service payload carrying every field still parses, thatthroughput_fractionis read as0.5, and that the dropped fields are silently absorbed.try_combine_online_to_offline_recomputes_highest_non_offline— extended with the recovery flip (Online → Offline → Online round-trip) so split-detection semantics are pinned across the field reduction.Validation
cargo fmt --checkcleancargo clippy -p azure_data_cosmos_driver --all-features --all-targets -- -D warningscleancargo test -p azure_data_cosmos_driver --lib --all-features— 1439 passedcspellcleanReviewer notes
The slim struct is a public-API breaking change (8
pubfields removed,PartialEq/Hashsemantics relaxed); CHANGELOG documents both in a single entry. There is no behavioral change to routing or to the wire contract.A microbenchmark for the cache hot path was prototyped during development but is not part of this PR — it will be submitted separately under the
azure_data_cosmos_benchmarkscrate so reviewers of the model change can focus on the breaking-change surface here.