Skip to content

Slim PartitionKeyRange cache#4393

Merged
tvaron3 merged 6 commits into
Azure:release/azure_data_cosmos-previewsfrom
tvaron3:tvaron3/strip-pkrange-cache-fields
May 13, 2026
Merged

Slim PartitionKeyRange cache#4393
tvaron3 merged 6 commits into
Azure:release/azure_data_cosmos-previewsfrom
tvaron3:tvaron3/strip-pkrange-cache-fields

Conversation

@tvaron3

@tvaron3 tvaron3 commented May 12, 2026

Copy link
Copy Markdown
Member

Summary

Slims the cached PartitionKeyRange from 14 fields to 6, mirroring memory optimization #2 from Azure SDK for Python PR #46297. The wire contract is unchanged — the dropped JSON keys are silently ignored by serde on deserialization.

Motivation

The routing-map cache holds one PartitionKeyRange per (container, range) pair. With many containers — or several CosmosClient instances against the same account — that footprint dominates the per-container heap. The Cosmos service returns ~14 fields on /pkranges, but the routing layer reads only a handful.

This PR slims the cached struct from 14 fields (~232 B) to 6 fields (112 B on 64-bit).

Changes

azure_data_cosmos_driver

PartitionKeyRange slimmed: the cached struct now retains the four fields the routing layer consults (id, min_inclusive, max_exclusive, status) plus throughput_fraction and parents, kept for downstream consumers that read them directly. Dropped: resource_id, self_link, etag, timestamp, rid_prefix, target_throughput, lsn, owned_archival_pk_range_ids.

  • status is the only retained field beyond the geometry triple consulted on the routing hot path: validate_and_build_index reads it to compute highest_non_offline_pk_range_id for split detection.
  • throughput_fraction and parents are retained on the cached representation for downstream consumers; the routing layer itself does not consult them.

PartialEq / Hash relaxed to the routing-relevant identity tuple (id, min_inclusive, max_exclusive)_rid is no longer in the comparison key. A grep across the workspace confirmed PartitionKeyRange is never used as a HashMap / HashSet / BTreeMap / BTreeSet key in this crate.

Tests

  • cached_size_stays_small — asserts size_of::<PartitionKeyRange>() <= 120 B (re-bloat tripwire) AND == 112 B on 64-bit (silent layout-regression tripwire).
  • deserialization_ignores_stripped_metadata_fields — asserts that a service payload carrying every field still parses, that throughput_fraction is read as 0.5, and that the dropped fields are silently absorbed.
  • try_combine_online_to_offline_recomputes_highest_non_offline — extended with the recovery flip (Online → Offline → Online round-trip) so split-detection semantics are pinned across the field reduction.

Validation

  • cargo fmt --check clean
  • cargo clippy -p azure_data_cosmos_driver --all-features --all-targets -- -D warnings clean
  • cargo test -p azure_data_cosmos_driver --lib --all-features1439 passed
  • cspell clean

Reviewer notes

The slim struct is a public-API breaking change (8 pub fields removed, PartialEq / Hash semantics relaxed); CHANGELOG documents both in a single entry. There is no behavioral change to routing or to the wire contract.

A microbenchmark for the cache hot path was prototyped during development but is not part of this PR — it will be submitted separately under the azure_data_cosmos_benchmarks crate so reviewers of the model change can focus on the breaking-change surface here.

Mirror Azure SDK for Python PR #46297 (memory optimization #2):
strip 8 unused metadata fields from PartitionKeyRange in
azure_data_cosmos_driver, keeping only id/min_inclusive/max_exclusive/
status/throughput_fraction/parents. Stripped fields (resource_id,
self_link, etag, timestamp, rid_prefix, target_throughput, lsn,
owned_archival_pk_range_ids) are silently ignored by serde on
deserialization, so the wire contract is unchanged. Per-entry size
drops from ~232 to 112 bytes on 64-bit.

Relax PartialEq/Hash to the routing-relevant identity tuple
(id, min_inclusive, max_exclusive), matching the Java SDK's equals/
hashCode semantics. Internal callers never used PartitionKeyRange
as a hash-map key.

Add criterion bench (pkrange_cache) for cached and uncached
resolve_partition_key_range_id paths at n=1/10/100. The bench mock
transport now honors If-None-Match / 304 to terminate the cache
refresh loop without re-merging the same payload.

Add cached_size_stays_small regression test (cap 120 + exact pin
112 on 64-bit) and try_combine_online_to_offline_recomputes_highest_
non_offline plus the recovery flip to lock in routing semantics
across the field reduction.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tvaron3 and others added 4 commits May 12, 2026 15:19
Per scope refinement, this PR keeps only the PartitionKeyRange
slim-down and the relaxed PartialEq/Hash. The criterion bench
(`pkrange_cache`) and the bench-only mock 304/If-None-Match
plumbing are reverted out of this PR; they can be reintroduced
separately if/when we want a tracked perf baseline.

Reverts:
- benches/pkrange_cache.rs (deleted)
- src/lib.rs: PKRANGES_ETAG, build_pkranges_payload, mock 304
  dispatch, MockHttpClientFactory pkranges plumbing, helper
  self-test
- Cargo.toml: [[bench]] pkrange_cache entry, serde_json dev-dep
- .cspell.json: bench-only words (benchcontainer, benchdb)
- Cargo.lock: regenerated (drops serde_json from
  azure_data_cosmos_benchmarks dev-deps)

The driver-side changes (slim struct, relaxed equality, regression
tests in container_routing_map and partition_key_range modules)
are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PartialEq / Hash relaxation bullet previously cited the Java SDK
as the parity reference. Remove that cross-SDK note; the bullet now
just describes the new behavior in this crate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build 6287970 (Analyze stage) flagged 'namedtuple' as an unknown
word in two places — the CHANGELOG bullet that references Python's
slim namedtuple, and the throughput_fraction doc comment that does
the same. Add the term to sdk/cosmos/.cspell.json (alphabetically,
matching the cosmos convention of central allowlist over inline
disable directives).

Verified locally: cspell reports 0 issues on both files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewords the throughput_fraction doc comment to describe what the
field is and why it stays on the slim struct, without naming the
Python SDK or its namedtuple. Drops the same parity sentence from
the slim-struct CHANGELOG bullet.

Since 'namedtuple' no longer appears in tracked source, revert its
addition to sdk/cosmos/.cspell.json (added in the previous commit
to clear the Analyze stage). cspell is clean against the new text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvaron3 tvaron3 marked this pull request as ready for review May 12, 2026 23:39
@tvaron3 tvaron3 requested a review from a team as a code owner May 12, 2026 23:39
Copilot AI review requested due to automatic review settings May 12, 2026 23:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR slims the Cosmos driver PartitionKeyRange model used by the routing-map cache and updates related routing-map tests and release notes. The benchmark described in the PR metadata was not present in the reviewed repository snapshot.

Changes:

  • Removed unused metadata fields from PartitionKeyRange and relaxed equality/hash semantics.
  • Added tests for deserializing service payloads with stripped metadata and preserving routing-map status behavior.
  • Updated the driver changelog with breaking-change notes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
sdk/cosmos/azure_data_cosmos_driver/src/models/partition_key_range.rs Slims PartitionKeyRange, updates trait implementations, and adds size/deserialization tests.
sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/container_routing_map.rs Updates test helpers and adds regression coverage for offline/online status recomputation.
sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Documents the PartitionKeyRange breaking changes.

Comment thread sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Outdated
@tvaron3 tvaron3 changed the title Slim PartitionKeyRange cache + add pkrange bench Slim PartitionKeyRange cache May 12, 2026
Addresses Copilot reviewer feedback on PR Azure#4393:

- Merge the two slim/relax bullets into one Breaking Changes entry, per
  the cosmos changelog instructions which require a single entry per PR.
- Reword the entry so it accurately distinguishes the four fields the
  routing layer consults (id, min_inclusive, max_exclusive, status) from
  the two fields kept on the cached representation for downstream
  consumers (throughput_fraction, parents). The previous wording said
  the struct keeps 'only the fields the routing-map cache reads', which
  was inaccurate since throughput_fraction is consumer-retained.
- Add the PR link suffix matching the rest of the file's style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@FabianMeiswinkel FabianMeiswinkel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation Bot moved this from Todo to Approved in CosmosDB Rust SDK and Driver May 13, 2026
@tvaron3 tvaron3 merged commit e708bc0 into Azure:release/azure_data_cosmos-previews May 13, 2026
12 checks passed
@github-project-automation github-project-automation Bot moved this from Approved to Done in CosmosDB Rust SDK and Driver May 13, 2026
tvaron3 added a commit to tvaron3/azure-sdk-for-rust that referenced this pull request May 18, 2026
**SDK fault-injection: expose driver type directly**
- Delete the SDK-side `FaultInjectionClientBuilder` wrapper (was a thin rules container after the earlier cleanup). `CosmosClientBuilder::with_fault_injection` now accepts the driver's `Vec<Arc<FaultInjectionRule>>` directly. Users build rules with the re-exported driver `FaultInjectionRuleBuilder` and pass the vector — no SDK-side intermediate type.
- Migrate every test consumer (`cosmos_fault_injection.rs`, `cosmos_multi_write_fault_injection.rs`, `cosmos_multi_write_retry_policies.rs`) plus the test framework (`TestOptions::fault_injection_rules`, `with_fault_injection_rules`, `TestClient::from_env_with_fault_rules`).
- Update module-level doc + the SDK-to-driver cutover doc to describe the new API.
- Update CHANGELOG entry.

**Pipeline matrix: fold Gateway 2.0 entries into live-platform-matrix**
- Delete `sdk/cosmos/live-gateway20-matrix.json` and the second `LiveTestMatrixConfigs` entry in `sdk/cosmos/ci.yml`. The two G2 settings (`Session SingleRegion Gateway20` and `Session MultiRegion Gateway20`) move into the existing matrix's `include` array, scoped to ubuntu only so they don't cross-multiply with the existing `Account Settings` × {ubuntu, windows, macos} grid (G2 tests are Linux-only). The pre-provisioned G2 account is still consumed via the `AZURE_COSMOS_GW20_*` env vars wired at the job level.

**Merge upstream/release/azure_data_cosmos-previews**
- Pulled in: `Azure#4393` (slim PartitionKeyRange cache), `Azure#4389` (hub-region-processing header), `Azure#4312` (local query-plan generator scaffolding).
- Two conflicts, both trivial: CHANGELOG (G2 entry vs new upstream entries, both kept), `operation_pipeline.rs` (upstream switched to `pipeline_type.is_data_plane()`; G2 work added `account_name.is_some()` second arg to `resolve_endpoint` — kept both).
- Test helper `build_minimal_transport_request` from upstream needed the G2 PR's new `effective_consistency` field on `TransportRequestContext`; set to `DefaultConsistencyLevel::Session`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kundadebdatta added a commit that referenced this pull request May 18, 2026
…ation (#4419)

### What this PR does

Documentation-only port. Adds a single new file:

- `sdk/cosmos/azure_data_cosmos_benchmarks/docs/PPCB_MEMORY_ANALYSIS.md`
(+1,571 lines)

No source, manifest, test, or CI changes. No behavioral impact on any
shipped crate.

### Why

The Per-Partition Circuit Breaker (PPCB) subsystem in
`azure_data_cosmos_driver` carries lazy routing state whose worst-case
footprint scales with `partition_count × failed_regions_per_partition`.
Large-container customers (~80k physical partitions) have asked us for
concrete memory-cost guidance. This document is the consolidated answer,
captured under `azure_data_cosmos_benchmarks/docs` so it lives next to
the harness that produced the numbers.

It also documents an already-applied v2 optimization
(`SmallVec<[CosmosEndpoint; 4]>` with the `union` feature replacing
`HashSet<CosmosEndpoint>`, plus `CompactString` for
`PartitionKeyRangeId`) and validates it against both a synthesized
harness and a real Cosmos DB INT account.

### Document structure

| Section | Content |
|---|---|
| §1–§9 | Methodology, harness, workload, headline numbers,
per-allocation decomposition, per-entry cost model for the v1
(pre-optimization) driver. |
| §10–§14 | Findings, recommendations, risk/operational implications,
caveats, reproduction steps. |
| Appendix A/B | Raw DHAT stack frame table + glossary. |
| **§15** | **Verified v2 optimization results** — code changes applied,
type-size impact, DHAT before/after, re-projected scaling. |
| **§16** | **Real-account validation (v2)** — N = 115 partitions on a
real Cosmos INT account, cross-checked against §15 projections. |
| **§17** | **Real-account v1 vs v2 comparison** — apples-to-apples Δ on
the same INT account. |
| **§18** | **Simulated v1 vs v2 at 1k and 10k partitions** —
closed-form model derived from §15/§17 per-entry slopes; predictions for
fleet-scale N. |

### Headline numbers documented

**v1 (HashSet-based) at 80k partitions, fully tripped, 2 failed
regions/partition:**

| Dimension | PPCB Disabled | PPCB Enabled | Delta |
|---|---:|---:|---:|
| Peak live heap | 1,380 B | 24,604,302 B (≈ 23.46 MiB) | +24,602,922 B
|
| Peak live blocks | 13 | 160,014 | +160,001 |
| Bytes / partition entry | — | ~308 B | — |
| Heap blocks / partition entry | — | 2 | — |
| Leaks | 0 | 0 | — |

**v2 (`SmallVec` + `CompactString`) impact:**

- **Block-count pressure at 80k partitions: 160,014 → ~14 (−99.99 %)** —
the dominant operational win (eliminates ~160k allocator round-trips on
partition-event storms).
- **Peak heap at 80k partitions: ~23.5 MiB → ~17 MiB (−28 %)** — also
drops the single contiguous 20 MiB `HashMap` backing allocation,
reducing fragmentation risk.
- **`PartitionFailoverEntry` shrank 8 B** (unexpected secondary win from
`SmallVec`'s `union` feature reusing the discriminant niche).
- Confirmed on a real Cosmos INT account at N = 115 (§16/§17) and
modeled at N = 1k / 10k (§18).

### Reviewer guidance

- **Pure doc PR** — no diff against `Cargo.toml`, `src/`, `tests/`, or
CI.
- The file is 1,571 lines but is fully self-contained; reviewers don't
need to run anything.
- If you want to reproduce, §14 (synthesized harness), §16.10 / §17.7
(real account), and §18.8 (simulated 1k/10k) each have copy-paste
reproduction blocks.
- The code changes the document refers to (v2: `SmallVec` +
`CompactString`) are **already shipped** on
`release/azure_data_cosmos-previews` (see PR #4393 *Slim
PartitionKeyRange cache* in the immediate history) — this PR only adds
the report that motivated and validated them.

### Validation

- `git status` clean after commit (only pre-existing untracked
`sdk/cosmos/.vscode/` remains, untouched).
- Branch is exactly 1 commit ahead of
`origin/release/azure_data_cosmos-previews` (`962c1f7b7`).
- No build/test/clippy/fmt checks needed (Markdown-only change in a
`docs/` folder).

### Out of scope

- Any further memory work beyond `SmallVec` + `CompactString` (e.g.,
§11.3 *boxing the HashMap value* is explicitly deferred).
- Driver source changes (the v2 optimizations are already merged
separately).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants