Skip to content

[DO NOT REVIEW] Cosmos: Support addressing databases and containers by RID#4613

Open
simorenoh wants to merge 11 commits into
mainfrom
simorenoh/cosmos-resolve-by-rid
Open

[DO NOT REVIEW] Cosmos: Support addressing databases and containers by RID#4613
simorenoh wants to merge 11 commits into
mainfrom
simorenoh/cosmos-resolve-by-rid

Conversation

@simorenoh

@simorenoh simorenoh commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Adds the ability to address Azure Cosmos databases and containers by resource ID (RID) in addition to name, and fixes the driver so RID-addressed requests actually authenticate against the gateway.

Fixes #4513

What changed

Public API (azure_data_cosmos)

  • New ResourceId newtype and ResourceIdentity enum. database_client(...) and container_client(...) accept impl Into<ResourceIdentity>, so a plain &str/String selects by name and ResourceId::from(...) selects by RID.
  • Mixing a name-addressed parent with a RID-addressed child (or vice versa) is rejected at construction — addressing must be consistent within a chain, matching the service contract.

Driver protocol fix (azure_data_cosmos_driver)
The earlier approach signed every request over the full name-style resource link and percent-encoded the RID in the URL, so RID reads were rejected with 401 Unauthorized. This PR matches the service's is_name_based = false rule:

  1. Signing — RID-addressed databases, containers, and their feeds/children sign over the lowercased resource RID only (the leaf for point reads, the parent for feeds), via a generalized signing_override (ResourcePaths.rid_based + rid_signing_override/is_rid_addressed helpers, generalizing the existing offers pattern).
  2. URL encoding — RID paths are sent raw; only name paths are percent-encoded (encode_path_segments is now name-path-only).
  3. No-mix enforcement — the driver rejects a mixed name/RID chain in release builds (validate_addressing) in addition to the debug assertion, so an inconsistent reference can never reach the wire.

Validation

  • cargo fmt + cargo clippy --all-features --all-targets clean on both crates.
  • Lib test suites: 0 failed (2002 driver + 100 SDK), including RID signing / raw-path / no-mix unit tests. Merged up to latest main.
  • Live end-to-end matrix, 9/9 PASS against a real account: read database by RID, read container by RID (cold cache), query_containers under a RID database, read_throughput by RID, query_items single- and cross-partition by RID, plus name-addressed controls.

Out of scope (follow-up)

Item point-operations (read_item/replace_item/delete_item/patch_item) by item _rid on RID-addressed containers — all SDK item methods address items by name today. By design we add no client-side guard for the mixed shape and let the service reject it. Tracked in #4612.

simorenoh and others added 3 commits June 12, 2026 15:06
Adds the ability to obtain `DatabaseClient` and `ContainerClient`
instances using a Cosmos resource ID (RID) in addition to a name, per
issue #4513.

Public crate (`azure_data_cosmos`):
- New `ResourceId` newtype and `#[non_exhaustive]` `ResourceIdentity`
  enum (owned, no lifetime parameter). `CosmosClient::database_client`
  and `DatabaseClient::container_client` now take
  `impl Into<ResourceIdentity>`, so a `&str`/`String` selects name
  addressing and a `ResourceId` selects RID addressing.
- `DatabaseClient` stores the identity, exposes `identity()`/`name()`/
  `rid()`, and skips the extra database read when resolving throughput
  for a RID-addressed database.
- `ContainerClient::new` enforces matching addressing modes (no mixing
  name and RID) and rejects a container RID whose derived parent
  database does not match the addressed database.

Driver crate (`azure_data_cosmos_driver`):
- `ContainerReference` is now addressing-mode aware: name-based fields
  are grouped behind an optional `NameAddressing`, equality/hashing key
  on account + container RID only, and `base_path()` selects the
  name- or RID-based path so item/sub-resource links keep name-based
  leaves over a RID-based container path.
- Added `resolve_container_by_rid`/`fetch_container_by_rid` (deriving
  the parent database RID from the container RID's bytes and validating
  container-level RID length), `read_container_by_rid`, a RID-aware
  container cache, and the `CLIENT_INVALID_RESOURCE_ID` /
  `CLIENT_MIXED_NAME_RID_ADDRESSING` statuses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cold-cache resolution by RID (e.g. container_client(rid)) failed against
real Cosmos with 401 Unauthorized because base64 RIDs were placed raw in
the request URL path. The gateway canonicalizes the path, derives a
resource id different from the one we signed, and rejects the request.

Encode the reserved characters in each path segment (notably the =
padding in RIDs) when building the request URL, while the authorization
signature continues to use the raw resource link via the same paths
buffer. Separators and unreserved characters are preserved; the helper
borrows its input when no encoding is needed.

Verified live: cold container_client(rid) resolve and container.read()
by RID now return 200 (previously 401).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RID-addressed data-plane requests were always signed over the full
name-style resource link and had their RID percent-encoded in the URL,
so the gateway rejected them (401 for database/container reads by RID).

Match the service's `is_name_based = false` rule instead: when a
reference is RID-addressed, sign over the lowercased resource RID (the
leaf for point reads, the parent for feeds) and send the RID raw in the
request path. Name-addressed paths keep full-link signing and
percent-encoding.

- cosmos_resource_reference.rs: add `ResourcePaths.rid_based` plus a
  `signing_override` populated for RID resources via a new
  `rid_signing_override` helper; `is_rid_addressed` drives raw paths.
  Generalizes the existing offers signing-override pattern.
- operation_pipeline.rs: skip `encode_path_segments` for RID paths.
- Supersedes the percent-encoding band-aid; `encode_path_segments` is
  now name-path-only.

Validated live (db/container read by RID, query_containers under a RID
db, read_throughput by RID, query_items single- and cross-partition on
a RID container) plus name-addressed controls. Item point-ops by item
RID remain a follow-up.

Fixes #4513

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 20:34
@simorenoh simorenoh requested a review from a team as a code owner June 16, 2026 20:34
@github-actions github-actions Bot added the Cosmos The azure_cosmos crate label Jun 16, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

There are confirmed correctness/compilation blockers in the updated signing logic and a non-compiling Send assertion that must be fixed before approval.

Pull request overview

This PR adds first-class support for addressing Cosmos databases/containers by RID (in addition to name) in the azure_data_cosmos public API, and updates the azure_data_cosmos_driver request-path + signing logic so RID-addressed operations authenticate and route correctly.

Changes:

  • Introduces ResourceId + ResourceIdentity and updates CosmosClient::database_client / DatabaseClient::container_client to accept either name or RID identity.
  • Updates driver resource reference/path computation to sign RID-addressed requests over the lowercased RID and to send RID paths raw (while still percent-encoding name paths).
  • Adds RID container resolution path (resolve_container_by_rid) plus related caching/keying adjustments.
File summaries
File Description
sdk/cosmos/azure_data_cosmos/tests/framework/test_client.rs Adjusts tests to pass container/database identifiers as &str to match the new Into<ResourceIdentity> API.
sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_containers.rs Updates emulator container CRUD test to use &str for container identity.
sdk/cosmos/azure_data_cosmos/src/resource_identity.rs Adds the new public ResourceId + ResourceIdentity types and unit tests for conversions.
sdk/cosmos/azure_data_cosmos/src/lib.rs Wires the new module and re-exports ResourceId / ResourceIdentity.
sdk/cosmos/azure_data_cosmos/src/clients/database_client.rs Switches DatabaseClient to store ResourceIdentity, updates container_client signature, and adds identity accessors + RID shortcut for throughput lookups.
sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client.rs Updates database_client to accept impl Into<ResourceIdentity> and documents name vs RID usage.
sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs Enforces no-mix name/RID addressing when constructing a ContainerClient and adds RID resolution path.
sdk/cosmos/azure_data_cosmos/CHANGELOG.md Documents the new identity API and the RID signing/path routing fix.
sdk/cosmos/azure_data_cosmos_driver/src/models/resource_reference.rs Makes ContainerReference optionally carry name-addressing data, adds base_path(), and updates equality/hash semantics for cache keying.
sdk/cosmos/azure_data_cosmos_driver/src/models/mod.rs Re-exports encode_path_segments for use by the pipeline.
sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_resource_reference.rs Generalizes RID signing overrides and tracks whether paths must be sent raw vs percent-encoded.
sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_operation.rs Adds a read_container_by_rid factory method.
sdk/cosmos/azure_data_cosmos_driver/src/error/cosmos_status.rs Adds client-side sub-status codes for invalid RIDs and mixed name/RID addressing.
sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/session_container.rs Switches session-token keying to use ContainerReference::base_path().
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs Applies percent-encoding only for name-based paths; sends RID paths raw.
sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs Implements container metadata fetch/resolve by RID (including decoding/validation).
sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/container_cache.rs Adds by-RID fetch path and avoids populating the by-name index for RID-only references.
sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Documents the RID signing + raw-path routing bug fix.

Copilot's findings

  • Files reviewed: 18/18 changed files
  • Comments generated: 4

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment on lines +242 to +251
async fn resource_id(&self) -> crate::Result<String> {
if let Some(rid) = self.rid() {
return Ok(rid.as_str().to_owned());
}
let db = self.read(None).await?.into_model()?;
Ok(db
.system_properties
.resource_id
.expect("service should always return a '_rid' for a database"))
}
Comment on lines 327 to 331
fn assert_send<T: Send>(_: T) {}
let client: &DatabaseClient = todo!();
assert_send(client.container_client(todo!()));
assert_send(client.container_client(todo!() as ResourceIdentity));
assert_send(client.read(todo!()));
assert_send(client.query_containers(Query::from("SELECT * FROM c"), todo!()));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a valid cast, it'll just fail at runtime (and this is just testing compile-time conditions), but we should use something explicit rather than as ResourceIdentity.

Comment on lines +324 to +343
/// Returns the lowercased RID to sign against when this reference is
/// RID-addressed, or `None` when it is name-addressed (in which case the
/// full, case-preserved resource link is used for signing).
///
/// For RID-based requests the Cosmos master-key signature is computed over
/// the lowercased RID of the **signing resource** only — the leaf for point
/// operations and the parent for feed operations — not the full resource
/// link. This mirrors the `is_name_based = false` path in the service SDKs.
///
/// `is_feed` selects the parent (feed) vs. leaf (point op) signing resource.
fn rid_signing_override(&self, is_feed: bool) -> Option<String> {
if is_feed {
// Feed/parent-signed: parent is the account (no RID), the database,
// or the container, depending on the child resource type.
match self.resource_type {
ResourceType::DocumentCollection => self
.database
.as_ref()
.and_then(|db| db.rid())
.map(str::to_lowercase),
Comment on lines +447 to +461
/// Reads a container's properties by database and container RID.
///
/// Like [`read_container_by_name`](Self::read_container_by_name) but addresses
/// the container by RID. `database` must itself be a RID-based
/// [`DatabaseReference`] so the request path is fully RID-based
/// (`/dbs/{db_rid}/colls/{container_rid}`).
pub fn read_container_by_rid(
database: DatabaseReference,
container_rid: impl Into<std::borrow::Cow<'static, str>>,
) -> Self {
let resource_ref: CosmosResourceReference = CosmosResourceReference::from(database)
.with_resource_type(ResourceType::DocumentCollection)
.with_rid(container_rid.into());
Self::new(OperationType::Read, resource_ref, None)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an infallible API, I think Copilot has it right here. We should either make this API fallible and have it fail if it's given a non-RID database reference, or change the API to stop that. The debug assert is good, but we need to consider what happens in production.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I didn't consider this since I figured the network request for resolution would fail regardless. will reinforce this _rid path - my take is changing the signature to include account reference so that the factory builds the RID-based DatabaseReference itself from raw db_rid/container_rid strings, instead of trusting the caller to hand in a correctly-addressed one. main concern with this was the asymmetry on signatures but the rid-path is meant to be used as a fallback from name-based mostly so should be ok in my opinion

@simorenoh simorenoh changed the title Support addressing Cosmos databases and containers by RID Cosmos: Support addressing databases and containers by RID Jun 17, 2026
simorenoh and others added 3 commits June 17, 2026 14:25
Tightens two latent issues surfaced in PR review (no behavior change for
existing call sites):

- Unify the "is this request RID-addressed?" decision. `is_rid_addressed()`
  now also returns true when the leaf `id` carries a RID, so the raw-path
  routing can never diverge from `rid_signing_override` (which signs over
  the leaf RID). A reference signed RID-based is now always routed raw,
  preventing a latent signed-RID/name-encoded-path mismatch (opaque 401).
  Adds a `debug_assert!` in `read_container_by_rid` requiring a RID-based
  `DatabaseReference`, plus a regression test.

- Replace the `.expect()` panic in `DatabaseClient::resource_id()` with a
  typed `CosmosError`. Adds the `SERVICE_RETURNED_DATABASE_WITHOUT_RID`
  status (500 / 20306), mirroring `SERVICE_RETURNED_OFFER_WITHOUT_ID`, so a
  broken-server-invariant response fails cleanly instead of panicking.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integrate the Public API Reorganization (#4512) and runtime/options
restructure (#4588) from main with the RID-addressing work on this branch.

Conflict resolutions:
- lib.rs / cosmos_client.rs / container_client.rs: adopt main's reorganized
  module layout and import grouping, re-adding the RID surface
  (resource_identity module, ResourceId/ResourceIdentity, ResourceIdentity
  imports).
- database_client.rs: keep the RID-aware resource_id() short-circuit; route
  throughput methods through it and delegate the missing-_rid error path to
  main's resource_id_or_error helper.
- cosmos_status.rs: drop the duplicate 20306 status; main's generic
  SERVICE_RETURNED_OBJECT_WITHOUT_RID is now canonical.
- cosmos_driver.rs: keep fetch_container_by_rid; adopt main's fallible
  CosmosDriver::new signature.
- CHANGELOGs: combine both crates' Unreleased entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a Breaking Changes entry for azure_data_cosmos noting that database_client/container_client moved from &str to impl Into<ResourceIdentity>, a compile-time-only source change affecting deref-coercion call sites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@analogrelay analogrelay left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking stuff is minor. Some non-blocking thoughts on pushing ResourceIdentity down.

Comment on lines +383 to +390
/// Under the no-mix addressing rule a RID database implies a RID container,
/// so checking the container and database covers database, container, and
/// item operations. The leaf `id` is also checked so the raw-path decision
/// can never diverge from [`rid_signing_override`](Self::rid_signing_override),
/// which signs over the leaf RID when `id` carries one: a request signed as
/// RID-based must always be routed RID-based (raw), otherwise the gateway
/// rejects it with an opaque `401`. Offers are handled separately and are
/// not considered RID-addressed for path-encoding purposes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we debug_assert! this somewhere? That if a reference is created with RIDs, all the pieces are RIDs and if it's created with Names, all the pieces are Names?

Comment on lines +447 to +461
/// Reads a container's properties by database and container RID.
///
/// Like [`read_container_by_name`](Self::read_container_by_name) but addresses
/// the container by RID. `database` must itself be a RID-based
/// [`DatabaseReference`] so the request path is fully RID-based
/// (`/dbs/{db_rid}/colls/{container_rid}`).
pub fn read_container_by_rid(
database: DatabaseReference,
container_rid: impl Into<std::borrow::Cow<'static, str>>,
) -> Self {
let resource_ref: CosmosResourceReference = CosmosResourceReference::from(database)
.with_resource_type(ResourceType::DocumentCollection)
.with_rid(container_rid.into());
Self::new(OperationType::Read, resource_ref, None)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an infallible API, I think Copilot has it right here. We should either make this API fallible and have it fail if it's given a non-RID database reference, or change the API to stop that. The debug assert is good, but we need to consider what happens in production.


### Bugs Fixed

- RID-addressed data-plane requests are now authenticated correctly. The driver signs RID-addressed databases, containers, and their feeds/children over the lowercased resource RID (matching the service's `is_name_based = false` rule) and sends the RID raw in the request URL instead of percent-encoding it. Previously the driver always signed the full name-style resource link and percent-encoded the RID, which made the gateway reject RID reads (e.g. `401` for a database or container read by RID). Reading a database or container by RID, listing/querying containers under a RID database, and querying items, reading throughput, and creating items on a RID-addressed container now work end-to-end. ([#4513](https://github.com/Azure/azure-sdk-for-rust/issues/4513))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG entries should reference the PR, not the issue

Comment on lines 327 to 331
fn assert_send<T: Send>(_: T) {}
let client: &DatabaseClient = todo!();
assert_send(client.container_client(todo!()));
assert_send(client.container_client(todo!() as ResourceIdentity));
assert_send(client.read(todo!()));
assert_send(client.query_containers(Query::from("SELECT * FROM c"), todo!()));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a valid cast, it'll just fail at runtime (and this is just testing compile-time conditions), but we should use something explicit rather than as ResourceIdentity.

/// Returns the identifier of the Cosmos database.
/// Returns the identifier used to construct this client: the database name
/// when addressed by name, or the RID string when addressed by RID.
pub fn id(&self) -> &str {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this could be a name or a RID, we should return &ResourceIdentity.

}

/// Returns the identity (name or RID) used to construct this client.
pub fn identity(&self) -> &ResourceIdentity {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and use id to return the ResourceIdentity reference.

We can break this API pre-1.0, especially since name() gives the same results as the old id().

/// allocates an owned copy.
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
#[non_exhaustive]
pub enum ResourceIdentity {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this belongs in the driver, and DatabaseReference (et al) can accept it instead of having separate _from_name and _from_rid constructors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also allow you to have one API in the driver resolve_container instead of two variants.

@github-project-automation github-project-automation Bot moved this from Todo to Changes Requested in CosmosDB Rust SDK and Driver Jun 17, 2026
simorenoh and others added 3 commits June 18, 2026 16:22
…sing

A4/A5: DatabaseClient::id() now returns &ResourceIdentity and the redundant identity() accessor is removed. Add From<&ResourceIdentity> for ResourceIdentity so round-tripping a client's id() back into database_client/container_client preserves the addressing mode (previously a RID id() round-tripped as a name).

A3/C2: replace the 'todo!() as ResourceIdentity' cast in the Send assertion with an explicit typed binding.

A1: add a debug-only addressing-consistency assert in CosmosResourceReference::compute_paths that rejects a database/container parent chain (and database/container leaf id) mixing name and RID addressing. Item/sub-resource leaf ids stay exempt. Split the prior regression test into a release-mode safety-net check and a should_panic debug check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per repo convention, CHANGELOG entries link the PR (#4613), not the issue. Also corrects the feature entry to describe id() returning the ResourceIdentity (the identity() accessor was removed) and adds the missing PR link to the driver RID signing bug-fix entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make mixed name/RID addressing unrepresentable and reject it at the driver boundary, not just the SDK layer.

read_container_by_rid now takes (account, db_rid, container_rid) and builds the RID-based DatabaseReference itself, so a name-addressed parent can no longer be passed; the sole caller already decodes db_rid, so the call site is simpler and the debug_assert is gone.

CosmosResourceReference gains validate_addressing(), a release-mode counterpart to the debug-time addressing assertion. CosmosDriver::execute_operation calls it once per operation, turning a mixed reference into a deterministic CLIENT_MIXED_NAME_RID_ADDRESSING error in every build instead of an opaque gateway 401. The check is an in-memory field comparison and a no-op for consistent references, so request flow and network calls are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
simorenoh and others added 2 commits June 25, 2026 09:53
Resolve CHANGELOG conflicts by relocating the #4613 RID entries from the now-released 0.36.0/0.5.0 sections into the current 0.37.0/0.6.0 Unreleased sections, keeping main's released bug-fix entries in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cover the full RID-addressing flow against the emulator: create a
database and container by name, capture their service-assigned _rids,
then re-address both purely by RID and verify container read,
read_throughput, item create/read, single- and cross-partition item
queries, and query_containers all resolve to the same resources.

Add negative cases asserting that mixing name and RID addressing is
rejected (CLIENT_MIXED_NAME_RID_ADDRESSING) and that a container RID
from another database is rejected (CLIENT_INVALID_RESOURCE_ID).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simorenoh simorenoh changed the title Cosmos: Support addressing databases and containers by RID [DO NOT REVIEW] Cosmos: Support addressing databases and containers by RID Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate Do Not Merge

Projects

Status: Changes Requested

Development

Successfully merging this pull request may close these issues.

Cosmos: Allow Resolving Container and Database by RID

3 participants