Commit 2ad6e08
## Summary
Implements the `PartitionKeyRangeCache` and `CollectionRoutingMap` in
the `azure_data_cosmos_driver` crate to support caching and resolution
of partition key ranges. This is the foundational layer required for
**partition-level failover** (PPAF/PPCB) — enabling the driver to
resolve a user-supplied partition key into the concrete partition key
range ID that owns it.
## Motivation
Cosmos DB distributes data across physical partitions, each owning a
contiguous range of the hash space described by a `[minInclusive,
maxExclusive)` pair. To support partition-level failover, the driver
must be able to:
1. Compute the **effective partition key (EPK)** from user-supplied
partition key values.
2. Resolve the EPK to the owning **partition key range ID** via binary
search.
3. Cache the routing map per collection and refresh it incrementally via
the change feed protocol when partitions split.
## What's Included
### New Models (`src/models/`)
| File | Description |
|------|-------------|
| `range.rs` | Generic `Range<T>` with `contains()`,
`check_overlapping()`, `MinComparer`, `MaxComparer`, and boundary
semantics (`[min, max)`) |
| `partition_key_range.rs` | `PartitionKeyRange` model with 14
serde-annotated fields, `to_range()` conversion,
`PartitionKeyRangeStatus` enum (Online/Splitting/Offline/Split), custom
`PartialEq`/`Hash` on identity fields |
| `service_identity.rs` | `ServiceIdentity` model for direct-mode
routing (federation ID, service name, master flag) |
| `murmur_hash.rs` | MurmurHash3 x64-128 and x86-32 implementations
(ported from the Cosmos DB SDK / Austin Appleby's SMHasher) |
| `effective_partition_key.rs` | EPK computation engine — dispatches to
V1 (MurmurHash3-32 + binary-encoded components) or V2 (MurmurHash3-128 +
reversed bytes) based on partition key kind and version |
### Partition Key Value Extensions (`src/models/partition_key.rs`)
- Made `PartitionKeyValue` `pub` (was `pub(crate)`) for EPK hashing
access.
- Added `Infinity` variant to `InnerPartitionKeyValue` for EPK boundary
calculations.
- Added `write_for_hashing_v1()`, `write_for_hashing_v2()`,
`write_for_binary_encoding_v1()` methods for EPK computation.
- Added `PartitionKey::values()` accessor for the EPK engine.
### Cache Components (`src/driver/cache/`)
| File | Description |
|------|-------------|
| `collection_routing_map.rs` | `CollectionRoutingMap` — sorted routing
map with O(log n) EPK→range binary search, O(1) ID lookup via
`HashMap<String, (PartitionKeyRange, Option<ServiceIdentity>)>`,
gone-parent filtering, completeness validation, `try_combine()` for
incremental merge, `get_overlapping_ranges()`, and
`highest_non_offline_pk_range_id` |
| `partition_key_range_cache.rs` | `PartitionKeyRangeCache` — async
cache keyed by collection RID, backed by `AsyncCache<String,
CollectionRoutingMap>`. Supports lazy fetch, change-feed incremental
refresh (If-None-Match/304), `force_refresh` with ETag-based
deduplication, and transport-decoupled `fetch_fn` callback |
### Design Spec (`docs/PARTITION_KEY_RANGE_CACHE_SPEC.md`)
Comprehensive specification covering:
- Architectural overview and component design
- EPK computation (V1/V2 hash algorithms)
- `CollectionRoutingMap` data structure, construction, validation, and
lookup algorithms
- Cache lifecycle (lazy fetch, steady state, incremental refresh,
invalidation)
- Error handling and fallback strategies
- Performance characteristics
- Cross-SDK feature matrix comparison (.NET, Java, Rust driver, Rust
SDK)
- Testing strategy with existing and recommended tests
- SDK deprecation and migration plan
- Future work priorities
## Architecture Highlights
### Transport Decoupling
The cache accepts a generic `Fn(String, Option<String>) -> Future<Output
= Option<PkRangeFetchResult>>` callback instead of holding a direct
reference to the transport pipeline. This keeps the cache fully
unit-testable without a live endpoint and supports both gateway and
direct mode.
### ServiceIdentity Support
The `CollectionRoutingMap` stores `(PartitionKeyRange,
Option<ServiceIdentity>)` tuples per range, enabling future direct-mode
routing. Currently populated as `None` when fetched via gateway; will be
populated when the driver has direct connectivity information.
### Change Feed Protocol
The cache follows the Cosmos DB change feed pattern:
1. Initial fetch with `If-None-Match: None` returns all current ranges.
2. Subsequent incremental refreshes pass the previous `etag` as
continuation.
3. HTTP 304 Not Modified signals no changes since last fetch.
4. New ranges are merged via `try_combine()` with gone-parent filtering
and completeness validation.
### Single-Pending-I/O Semantics
Backed by `AsyncCache` + `AsyncLazy`, concurrent requests for the same
collection share a single in-flight fetch. Post-initialization reads are
lock-free (`Arc` clone).
## Cross-SDK Feature Parity
The implementation achieves near-complete parity with .NET and Java
SDKs:
- ✅ EPK → range binary search
- ✅ Range ID → range HashMap lookup + ServiceIdentity
- ✅ Overlapping range resolution
- ✅ `forceRefresh` with ETag-based deduplication
- ✅ Change feed incremental refresh with `try_combine`
- ✅ Gone parent filtering
- ✅ `isGone(rangeId)` check
- ✅ `HighestNonOfflinePkRangeId` for split detection
- ✅ `PartitionKeyRangeStatus` enum
- ✅ Completeness validation (distinct `OverlappingRanges` vs
`IncompleteRanges` errors)
## Integration Status
The cache is currently **standalone** — it is `pub(crate)` with
`#[allow(unused_imports)]` on its re-export. It will be wired into the
operation pipeline when pre-flight partition key range resolution lands
(tracked separately). See the spec §8.2 for a sample `fetch_pk_ranges`
implementation showing the planned integration pattern.
## Testing
**30+ unit tests** across all new modules:
- `collection_routing_map.rs`: 13 tests — construction, binary search,
overlapping ranges, gone filtering, validation errors, empty input
- `partition_key_range_cache.rs`: 4 tests — end-to-end resolve, empty PK
short-circuit, force refresh with incremental merge, response parsing
- `partition_key_range.rs`: 5 tests — construction, `to_range()`,
equality, serialization, overlap
- `range.rs`: 11 tests — creation, contains, overlap, edge cases,
comparers, serialization, string ranges
- `effective_partition_key.rs`: 5 tests — empty PK, V2 string/bool
hashes against cross-SDK reference values
- `murmur_hash.rs`: 2 tests — 128-bit float hash, 32-bit basic
- `service_identity.rs`: 3 tests — creation, application name, display
## Files Changed (11)
| File | Change |
|------|--------|
| `docs/PARTITION_KEY_RANGE_CACHE_SPEC.md` | **New** — Design
specification (+1,013) |
| `src/driver/cache/collection_routing_map.rs` | **New** — Routing map
with binary search (+560) |
| `src/driver/cache/mod.rs` | **Modified** — Register new modules and
re-exports (+7) |
| `src/driver/cache/partition_key_range_cache.rs` | **New** — Async PK
range cache (+430) |
| `src/models/effective_partition_key.rs` | **New** — EPK computation
engine (+175) |
| `src/models/mod.rs` | **Modified** — Register new model modules,
export `PartitionKeyValue` (+6/−2) |
| `src/models/murmur_hash.rs` | **New** — MurmurHash3 x64-128 and x86-32
(+206) |
| `src/models/partition_key.rs` | **Modified** — PK value hashing and
encoding methods (+132/−1) |
| `src/models/partition_key_range.rs` | **New** — Partition key range
model (+228) |
| `src/models/range.rs` | **New** — Generic range with boundary
semantics (+350) |
| `src/models/service_identity.rs` | **New** — Service identity for
direct-mode routing (+114) |
## Issue
Fixes #3999
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7e11672 commit 2ad6e08
18 files changed
Lines changed: 4132 additions & 2 deletions
File tree
- sdk/cosmos
- azure_data_cosmos_driver
- docs
- src
- driver/cache
- models
- testdata
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
| 69 | + | |
68 | 70 | | |
69 | 71 | | |
70 | 72 | | |
| |||
117 | 119 | | |
118 | 120 | | |
119 | 121 | | |
| 122 | + | |
120 | 123 | | |
121 | 124 | | |
122 | 125 | | |
| |||
147 | 150 | | |
148 | 151 | | |
149 | 152 | | |
| 153 | + | |
150 | 154 | | |
151 | 155 | | |
152 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
0 commit comments