[Cosmos] Feed Range API#3987
Conversation
| )] | ||
| pub(crate) fn from_range(range: &Range<String>) -> Self { | ||
| Self { | ||
| min_inclusive: range.min.clone(), |
There was a problem hiding this comment.
This is functionally incorrect - you also have to verify/convert is isMinIncluded/isMaxIncluded and the min/maxEPK
There was a problem hiding this comment.
We also need a factory method to produce FeedRange by PartitionKey (even partial) + ContainerReference - could be here or on container.
Also factory method just returning FeedRanges for physical partitions is needed.
There was a problem hiding this comment.
added this, except for partial since HPK work is currently in progress
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
…dependency for caches (#4005) # Summary Adds `azure_data_cosmos_driver` as a dependency to the SDK and introduces the `ContainerReference` pattern for eager container metadata resolution. This is the first step toward using the driver as the SDK's internal transport/routing layer. This PR is *not* the full cutover into the driver under the hood - its main purpose is to start that process with the bare minimum (current caches) without fully replacing the transport pipeline. That will be done in an entirely separate issue/PR. The cutover from the SDK's fault injection into the Driver's fault injection for end-to-end testing against the driver will also be in a separate issue/PR. This work is meant to unblock other work as well, like #3987. ## Design `ContainerClient` construction now eagerly resolves immutable container metadata (RID, partition key definition) via the driver's `resolve_container()`, rather than doing per-operation cache lookups in `send()`. This mirrors how the driver's own `ContainerReference` works. ``` CosmosClient::build() | v CosmosDriverRuntime → CosmosDriver (per-account singleton) | v DatabaseClient::container_client("name").await? | v driver.resolve_container(db, name) → ContainerReference | v ContainerConnection stores ContainerReference → send() uses stored RID + PK def (no per-op cache lookup) ``` ### SDK `ContainerReference` (No Model Sharing) The SDK defines its own `pub(crate) ContainerReference` adapted from the driver's type via `from_driver_ref()`. This follows the versioning strategy in `AGENTS.md` — `azure_data_cosmos` cannot expose `azure_data_cosmos_driver` types directly. ## Changes ### SDK (`azure_data_cosmos`) | File | Change | |------|--------| | `models/container_reference.rs` | New — `ContainerReference` with `from_driver_ref()`, `from_parts()`, accessors | | `clients/cosmos_client.rs` | Added `driver: Arc<CosmosDriver>` field, passes to `DatabaseClient` | | `clients/cosmos_client_builder.rs` | Creates `CosmosDriverRuntime` + `CosmosDriver` in `build()`, commented out 5 builder unit tests (need fault injection linked from SDK to driver) | | `clients/database_client.rs` | Added `driver` field, `container_client()` now returns `azure_core::Result<ContainerClient>` (breaking) | | `clients/container_client.rs` | `new()` calls `driver.resolve_container()`, builds `ContainerReference`, returns `Result` | | `handler/container_connection.rs` | Stores `ContainerReference`, `send()` uses stored metadata, fixed dual-cache-key bug | ### Dependency alignment | File | Change | |------|--------| | Root `Cargo.toml` | Added `azure_data_cosmos_driver` workspace dependency | | SDK `Cargo.toml` | Added driver dep, `azure_core` → workspace, `reqwest` feature forwards `driver/reqwest_native_tls` | | Native `Cargo.toml` | `azure_core` → workspace | | Perf `Cargo.toml` | `azure_core` + `azure_identity` → workspace | ### Call site updates (~59 sites across 19 files) All `.container_client()` calls updated to `.container_client().await?` across tests, examples, native crate, and perf crate. ### Bug fix `send()` previously used the container name (e.g., `"MyContainer"`) as the `pk_range_cache` key, while the cache parameter is named `collection_rid` and expects a RID. All lookups now consistently use `ContainerReference::collection_rid()`. ## Architecture notes - The driver is only used for `resolve_container()` in this PR. The SDK's `GatewayPipeline` still handles all data plane operations. Full transport cutover is planned for a future PR. - Both the SDK and driver maintain independent HTTP transports — acceptable overhead for this phase. - Delete+recreate of a same-name container will cause existing `ContainerClient` instances to fail — this will be addressed in a follow up taking care of container re-creation scenarios.
…for-rust into feed-range-apis
There was a problem hiding this comment.
Pull request overview
Adds initial Feed Range support to azure_data_cosmos, enabling consumers to retrieve physical partition feed ranges and deterministically map a partition key to the corresponding feed range (to be used later by query/change feed integrations).
Changes:
- Introduced a new public
FeedRangetype with base64(JSON) string round-tripping and overlap/containment helpers. - Added
ContainerClient::read_feed_ranges()(withReadFeedRangesOptions) andContainerClient::feed_range_from_partition_key(). - Added emulator coverage for reading feed ranges and mapping partition keys to physical ranges.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/mod.rs | Registers the new feed range emulator test module. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_feed_ranges.rs | Adds emulator tests for read_feed_ranges and feed_range_from_partition_key. |
| sdk/cosmos/azure_data_cosmos/src/options/mod.rs | Adds ReadFeedRangesOptions (supports force_refresh). |
| sdk/cosmos/azure_data_cosmos/src/lib.rs | Wires in the new feed_range module and re-exports FeedRange. |
| sdk/cosmos/azure_data_cosmos/src/hash.rs | Enhances EffectivePartitionKey to support ordering/hash and convenient conversions. |
| sdk/cosmos/azure_data_cosmos/src/handler/container_connection.rs | Exposes container PK definition and adds a routing map resolution helper with optional refresh. |
| sdk/cosmos/azure_data_cosmos/src/feed_range.rs | Implements the FeedRange type, serde + Display/FromStr, and unit tests. |
| sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs | Adds the new ContainerClient feed range APIs. |
| sdk/cosmos/azure_data_cosmos/CHANGELOG.md | Documents the new public API additions. |
| sdk/cosmos/azure_data_cosmos/Cargo.toml | Adds base64 dependency for cross-SDK compatible serialization. |
| Cargo.lock | Locks base64 dependency version. |
|
|
||
| use std::error::Error; | ||
|
|
||
| use azure_data_cosmos::{models::ContainerProperties, CreateContainerOptions, FeedRange}; |
There was a problem hiding this comment.
CreateContainerOptions is imported but never used in this test file. CI sets RUSTFLAGS=-Dwarnings, so this will fail compilation; remove the unused import (or use it if intended).
| use azure_data_cosmos::{models::ContainerProperties, CreateContainerOptions, FeedRange}; | |
| use azure_data_cosmos::{models::ContainerProperties, FeedRange}; |
| // Cosmos DB always uses [min, max) semantics. Reject ranges with unexpected inclusivity | ||
| // to prevent subtly incorrect containment/overlap checks. | ||
| if !json.range.is_min_inclusive || json.range.is_max_inclusive { | ||
| return Err(azure_core::Error::with_message( | ||
| azure_core::error::ErrorKind::DataConversion, | ||
| "feed range must have [min, max) semantics (isMinInclusive=true, isMaxInclusive=false)", | ||
| )); | ||
| } | ||
|
|
||
| Ok(Self { | ||
| min_inclusive: EffectivePartitionKey::from(json.range.min), | ||
| max_exclusive: EffectivePartitionKey::from(json.range.max), | ||
| }) |
There was a problem hiding this comment.
FeedRange::from_str (and the Deserialize impl) validate inclusivity flags, but they don’t validate that min < max and that both bounds are within the allowed EPK domain (e.g., "" <= min < max <= "FF"). This allows constructing invalid feed ranges that can break contains/overlaps semantics or future query integration; add bound/order validation and return a DataConversion error when invalid.
|
/azp run rust - cosmos - weekly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Adds MultiHash EPK (Effective Partition Key) computation and prefix partition key routing infrastructure to the driver crate, enabling hierarchical partition key (HPK) support for containers with multiple partition key paths. ## What changed ### MultiHash EPK computation (`effective_partition_key.rs`) Previously, `EffectivePartitionKey::compute()` fell through to single-hash V2 for `PartitionKeyKind::MultiHash`, producing incorrect EPKs. MultiHash requires each component to be hashed independently — this PR adds that per-component hashing. - Added `PartitionKeyKind::MultiHash` arm to `compute()` routing to new `effective_partition_key_multi_hash_v2()` - Each component is independently V2-encoded → MurmurHash3-128 → byte-reversed → top-2-bit masked → hex-encoded, then concatenated (N×32 hex chars) - Extracted shared `hash_v2_to_epk()` helper used by both single-hash and multi-hash paths - Algorithm verified against cross-SDK baselines (.NET, Go, Java) via existing `testdata/*.xml` fixtures ### Prefix EPK range computation (`effective_partition_key.rs`) - Added `compute_range()` for partial/prefix partition keys (fewer components than the container definition) - Full key → point range (start == end); prefix key on MultiHash → `[prefix_epk, prefix_epk + "FF")` range covering all possible suffix completions ### Prefix routing in PK range cache (`partition_key_range_cache.rs`) - Added `resolve_partition_key_range_ids()` that handles both full and prefix partition keys - Full key: point lookup (single range ID); prefix key: EPK range → `resolve_overlapping_ranges()` → multiple range IDs for fan-out ### Tests 9 new unit tests covering: - Single/two/three-component MultiHash EPK computation with expected value verification - MultiHash with `Undefined` component (partial HPK) - MultiHash vs single-hash divergence for multi-component keys - `compute_range()` for full keys, prefix keys (1-of-3, 2-of-3), and single-hash (always point) ## Follow-up: FeedRange API (PR #3987) PR [#3987](#3987) introduces `feed_range_from_partition_key()`, which currently uses the SDK's `hash.rs` to compute EPKs. For MultiHash containers, this hits the existing stub and returns incorrect results. Once that method is updated to route through the driver's EPK computation (as part of the SDK-to-driver cutover), it will get correct MultiHash support for free from this PR. Prefix HPK support for the FeedRange API (returning multiple feed ranges for partial keys) will additionally need the `compute_range()` infrastructure added here. ## Follow-up: Query tests and thorough testing This is the first step of the end-to-end implementation of this feature. The remaining work, piecing together the operations to this logic and ensuring that queries can also use it, rely on the migration to the driver. We need for the migration of requests to the driver to be finalized before we can add these tests.
Also removes nightly_windows matrix config from release pipeline
Increment package version after release of azure_data_cosmos_macros, azure_data_cosmos_driver, azure_data_cosmos
…for-rust into feed-range-apis
analogrelay
left a comment
There was a problem hiding this comment.
Looks like maybe you rebased something and got some extra version number update changes?
| min_inclusive: EffectivePartitionKey::from(""), | ||
| max_exclusive: EffectivePartitionKey::from("FF"), | ||
| min_inclusive: String::from(""), | ||
| max_exclusive: String::from("FF"), |
There was a problem hiding this comment.
Why aren't we using the EffectivePartitionKey type here anymore?
| [package] | ||
| name = "azure_data_cosmos" | ||
| version = "0.32.0" | ||
| version = "0.33.0" |
There was a problem hiding this comment.
This shouldn't be in this PR.
| ## 0.33.0 (Unreleased) | ||
|
|
||
| ### Features Added | ||
|
|
||
| ### Breaking Changes | ||
|
|
||
| ### Bugs Fixed | ||
|
|
||
| ### Other Changes | ||
|
|
There was a problem hiding this comment.
Same here, this shouldn't be in this PR.
|
sigh copilot was merging, I was merging, and together we made a mess. Making a new branch for this now. |
Feed Range API
Adds FeedRange type and container-level APIs for working with physical partition ranges in Azure Cosmos DB.
New Public APIs
bypass the routing map cache.
retry-on-stale-routing-map logic for resilience during partition splits.
Internal Changes
ContainerReference from the driver.
Validation
Tests
This is the foundation for follow-up work plugging feed ranges into query and change feed APIs.