Skip to content

Commit e708bc0

Browse files
tvaron3Copilot
andauthored
Slim PartitionKeyRange cache (#4393)
## Summary Slims the cached `PartitionKeyRange` from 14 fields to 6, mirroring memory optimization #2 from [Azure SDK for Python PR #46297](Azure/azure-sdk-for-python#46297). The wire contract is unchanged — the dropped JSON keys are silently ignored by serde on deserialization. ## Motivation The routing-map cache holds one `PartitionKeyRange` per `(container, range)` pair. With many containers — or several `CosmosClient` instances against the same account — that footprint dominates the per-container heap. The Cosmos service returns ~14 fields on `/pkranges`, but the routing layer reads only a handful. This PR slims the cached struct from **14 fields (~232 B)** to **6 fields (112 B on 64-bit)**. ## Changes ### `azure_data_cosmos_driver` **`PartitionKeyRange` slimmed:** the cached struct now retains the four fields the routing layer consults (`id`, `min_inclusive`, `max_exclusive`, `status`) plus `throughput_fraction` and `parents`, kept for downstream consumers that read them directly. Dropped: `resource_id`, `self_link`, `etag`, `timestamp`, `rid_prefix`, `target_throughput`, `lsn`, `owned_archival_pk_range_ids`. - `status` is the only retained field beyond the geometry triple consulted on the routing hot path: `validate_and_build_index` reads it to compute `highest_non_offline_pk_range_id` for split detection. - `throughput_fraction` and `parents` are retained on the cached representation for downstream consumers; the routing layer itself does not consult them. **`PartialEq` / `Hash` relaxed** to the routing-relevant identity tuple `(id, min_inclusive, max_exclusive)` — `_rid` is no longer in the comparison key. A grep across the workspace confirmed `PartitionKeyRange` is never used as a `HashMap` / `HashSet` / `BTreeMap` / `BTreeSet` *key* in this crate. ### Tests - `cached_size_stays_small` — asserts `size_of::<PartitionKeyRange>() <= 120 B` (re-bloat tripwire) AND `== 112 B` on 64-bit (silent layout-regression tripwire). - `deserialization_ignores_stripped_metadata_fields` — asserts that a service payload carrying every field still parses, that `throughput_fraction` is read as `0.5`, and that the dropped fields are silently absorbed. - `try_combine_online_to_offline_recomputes_highest_non_offline` — extended with the recovery flip (Online → Offline → Online round-trip) so split-detection semantics are pinned across the field reduction. ## Validation - `cargo fmt --check` clean - `cargo clippy -p azure_data_cosmos_driver --all-features --all-targets -- -D warnings` clean - `cargo test -p azure_data_cosmos_driver --lib --all-features` — **1439 passed** - `cspell` clean ## Reviewer notes The slim struct is a public-API breaking change (8 `pub` fields removed, `PartialEq` / `Hash` semantics relaxed); CHANGELOG documents both in a single entry. There is no behavioral change to routing or to the wire contract. A microbenchmark for the cache hot path was prototyped during development but is **not part of this PR** — it will be submitted separately under the `azure_data_cosmos_benchmarks` crate so reviewers of the model change can focus on the breaking-change surface here. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 09d85ce commit e708bc0

3 files changed

Lines changed: 210 additions & 99 deletions

File tree

sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
### Breaking Changes
1414

15+
- Slimmed the cached `PartitionKeyRange` to six fields, dropping eight metadata fields the routing-map cache never reads (`resource_id`, `self_link`, `etag`, `timestamp`, `rid_prefix`, `target_throughput`, `lsn`, `owned_archival_pk_range_ids`). The struct now retains the four fields the routing layer consults (`id`, `min_inclusive`, `max_exclusive`, `status`) plus `throughput_fraction` and `parents`, kept on the cached representation for downstream consumers that read them directly. As part of this change, `PartialEq` and `Hash` no longer hash `resource_id`: two ranges with the same `id` / `min_inclusive` / `max_exclusive` are now equal regardless of their `_rid`. Internal callers never used `PartitionKeyRange` as a hash-map key, but downstream consumers that did so should review their assumptions. Service responses are unchanged on the wire — the dropped JSON fields are silently ignored by serde on deserialization. ([#4393](https://github.com/Azure/azure-sdk-for-rust/pull/4393))
1516
- Changed `CosmosResponse::diagnostics()` to return `Arc<DiagnosticsContext>` instead of `&DiagnosticsContext`. The returned `Arc` derefs transparently for read-only inspection (existing call patterns like `response.diagnostics().activity_id()` continue to work), but bindings of the form `let d = response.diagnostics();` now own a cloned `Arc` handle rather than a borrow — letting callers retain operation diagnostics across `into_body()`. Replaces the additive `CosmosResponse::diagnostics_arc()` accessor introduced earlier in this preview cycle.
1617

1718
### Bugs Fixed

sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/container_routing_map.rs

Lines changed: 93 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -348,22 +348,9 @@ mod tests {
348348
max_exclusive: &str,
349349
parents: Option<Vec<String>>,
350350
) -> PartitionKeyRange {
351-
PartitionKeyRange {
352-
id: id.into(),
353-
resource_id: None,
354-
self_link: None,
355-
etag: None,
356-
timestamp: None,
357-
min_inclusive: min_inclusive.into(),
358-
max_exclusive: max_exclusive.into(),
359-
rid_prefix: None,
360-
throughput_fraction: 0.0,
361-
target_throughput: None,
362-
status: Default::default(),
363-
lsn: 0,
364-
parents,
365-
owned_archival_pk_range_ids: None,
366-
}
351+
let mut r = PartitionKeyRange::new(id.into(), min_inclusive, max_exclusive);
352+
r.parents = parents;
353+
r
367354
}
368355

369356
fn single_range() -> Vec<PartitionKeyRange> {
@@ -619,6 +606,96 @@ mod tests {
619606
assert!(matches!(result, Err(RoutingMapError::OverlappingRanges)));
620607
}
621608

609+
/// Builds a range with an explicit lifecycle status. Used by the
610+
/// `highest_non_offline_pk_range_id` regression tests below; not exposed
611+
/// through the slim `PartitionKeyRange::new` API because production
612+
/// callers should not be hand-stamping `status`.
613+
fn make_range_with_status(
614+
id: &str,
615+
min_inclusive: &str,
616+
max_exclusive: &str,
617+
parents: Option<Vec<String>>,
618+
status: PartitionKeyRangeStatus,
619+
) -> PartitionKeyRange {
620+
let mut r = PartitionKeyRange::new(id.into(), min_inclusive, max_exclusive);
621+
r.parents = parents;
622+
r.status = status;
623+
r
624+
}
625+
626+
#[test]
627+
fn highest_non_offline_pk_range_id_picks_max_online_id() {
628+
// Three online ranges, ids "1", "2", "3" — highest should be 3.
629+
let map = ContainerRoutingMap::try_create(three_ranges(), None, None)
630+
.unwrap()
631+
.unwrap();
632+
assert_eq!(map.highest_non_offline_pk_range_id(), 3);
633+
}
634+
635+
/// Regression guard: when an incremental change-feed merge re-publishes
636+
/// an existing range with `status = Offline`, the cached
637+
/// `highest_non_offline_pk_range_id` must be recomputed from the merged
638+
/// view (not stay pinned at the pre-merge value). This is the exact
639+
/// path that would silently break if we ever stripped `status` without
640+
/// a replacement plumbed through `try_combine`.
641+
#[test]
642+
fn try_combine_online_to_offline_recomputes_highest_non_offline() {
643+
let map = ContainerRoutingMap::try_create(three_ranges(), None, None)
644+
.unwrap()
645+
.unwrap();
646+
assert_eq!(map.highest_non_offline_pk_range_id(), 3);
647+
648+
// Service marks range "3" as Offline. Same id/EPK extents, no
649+
// parents — this is a status flip, not a split.
650+
let new_ranges = vec![make_range_with_status(
651+
"3",
652+
"7F",
653+
"FF",
654+
None,
655+
PartitionKeyRangeStatus::Offline,
656+
)];
657+
658+
let merged = map
659+
.try_combine(new_ranges, Some("etag".into()))
660+
.unwrap()
661+
.unwrap();
662+
663+
// Range "3" is still in the routing map (offline ranges are NOT
664+
// gone — they're in a transient state and may flip back).
665+
assert!(
666+
merged.range_by_id.contains_key("3"),
667+
"Offline range should remain in the routing map"
668+
);
669+
// But it must no longer count toward the highest non-offline id.
670+
assert_eq!(
671+
merged.highest_non_offline_pk_range_id(),
672+
2,
673+
"Highest non-offline id should drop from 3 to 2 after the status flip"
674+
);
675+
676+
// Recovery path: the same range comes back Online (e.g., after a
677+
// planned failover completes). The highest-non-offline id must
678+
// bump back to 3 — `validate_and_build_index` recomputes from
679+
// scratch on every merge, so this should "just work", but a
680+
// future caching-of-highest optimization could silently break it.
681+
let recovered_ranges = vec![make_range_with_status(
682+
"3",
683+
"7F",
684+
"FF",
685+
None,
686+
PartitionKeyRangeStatus::Online,
687+
)];
688+
let recovered = merged
689+
.try_combine(recovered_ranges, Some("etag2".into()))
690+
.unwrap()
691+
.unwrap();
692+
assert_eq!(
693+
recovered.highest_non_offline_pk_range_id(),
694+
3,
695+
"Highest non-offline id should bump back to 3 after the range recovers"
696+
);
697+
}
698+
622699
// -- Length-aware EPK ordering tests --
623700

624701
#[test]

0 commit comments

Comments
 (0)