Skip to content

Cosmos: RID-aware resource references and raw-path protocol (1/3)#4640

Open
simorenoh wants to merge 6 commits into
mainfrom
simorenoh/cosmos-rid-slice1-driver-protocol
Open

Cosmos: RID-aware resource references and raw-path protocol (1/3)#4640
simorenoh wants to merge 6 commits into
mainfrom
simorenoh/cosmos-rid-slice1-driver-protocol

Conversation

@simorenoh

Copy link
Copy Markdown
Member

Slice 1 of 3 (stacked) — epic #4637

This is the bottom of a 3-PR stack that re-lands the validated "address Cosmos databases/containers by RID" change as small, reviewable slices. Slice 1 is driver-internal only and has no public API surface. It is fully unit-tested.

What's in this slice

Driver references + the wire protocol for RID addressing:

  • models/resource_reference.rsContainerReference becomes RID-capable. New new_by_rid constructor and base_path() / is_by_rid() accessors; database_name() and name_based_path() now return Option (absent when the container was resolved purely by RID). Equality/hash key on account + container RID so name- and RID-resolved references for the same physical container collapse to one key.
  • models/cosmos_resource_reference.rs — the raw-path / lowercased-RID signing protocol: compute_paths raw-path handling, rid_signing_override, is_rid_addressed, ResourcePaths::is_rid_based, encode_path_segments, plus addressing_conflict / debug_assert_addressing_consistent (debug-only) and the full path/signing/round-trip unit tests.
  • models/mod.rs — export encode_path_segments to the driver crate.
  • driver/pipeline/operation_pipeline.rsbuild_transport_request sends RID-based paths raw and percent-encodes name-based paths (encoding the = padding of a base64 RID would make the gateway treat it as a name and reject the RID-based signature).
  • driver/routing/session_container.rs — session-token index keys on base_path() so RID- and name-addressed containers index consistently.

Deliberately deferred to later slices

  • validate_addressing + the CLIENT_MIXED_NAME_RID_ADDRESSING status → Slice 2 (resolution). Because of that, addressing_conflict and parent_chain_is_rid are #[cfg(debug_assertions)] here (debug-only consumers only) and will be un-gated by Slice 2.
  • Container resolution / cache / network and the SDK public API → Slices 2 and 3.

Deviations to keep this slice self-contained

  • A minimal compile shim in the otherwise out-of-scope driver/cache/container_cache.rs: database_name() now returns Option, so the name-key builder uses unwrap_or_default() and two test asserts wrap in Some(...). No Slice-2 RID-cache logic is pulled in; behavior is unchanged because Slice 1 never produces a RID-addressed ContainerReference in any real flow.
  • One intra-doc link to a Slice-2 symbol (resolve_container_by_rid) in a doc comment is reduced to plain prose to avoid a broken link.

Validation

  • cargo fmt, cargo build --all-features (debug + release), cargo clippy --all-features --all-targets — all clean, zero warnings.
  • cargo test --all-features --lib — 1998 passed, 0 failed, 64 ignored.

Slices 2 (resolution) and 3 (SDK public API) build on top of this.

Copilot AI review requested due to automatic review settings June 22, 2026 16:12
@simorenoh simorenoh requested a review from a team as a code owner June 22, 2026 16:12
@simorenoh simorenoh force-pushed the simorenoh/cosmos-rid-slice1-driver-protocol branch from 38005bf to f6199a7 Compare June 22, 2026 16:12
@github-actions github-actions Bot added the Cosmos The azure_cosmos crate label Jun 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is slice 1 of a 3-PR stack (epic #4637) that re-lands "address Cosmos databases/containers by RID" in small, reviewable pieces. This slice is driver-internal only with no public API surface: it makes ContainerReference RID-capable and implements the wire protocol for RID addressing — RID-based requests are signed over the lowercased RID and sent with the path raw, while name-based paths continue to be percent-encoded. The work feeds the later slices that add RID resolution (slice 2) and the SDK public API (slice 3).

Changes:

  • ContainerReference now carries name-based addressing in an optional NameAddressing group; database_name()/name_based_path() return Option, new base_path()/is_by_rid()/new_by_rid() are added, and equality/hash key only on account + container RID.
  • CosmosResourceReference gains the raw-path / lowercased-RID signing protocol (is_rid_addressed, rid_signing_override, ResourcePaths::is_rid_based, encode_path_segments) plus debug-only mixed-addressing assertions, and the transport pipeline now sends RID paths raw vs. percent-encoding name paths.
  • Session-token indexing and the container name-cache shim are updated for the Option-returning accessors; a driver CHANGELOG entry is added under a new 0.6.0 (Unreleased) section.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
models/resource_reference.rs Makes ContainerReference RID-capable (NameAddressing, new_by_rid, base_path, is_by_rid); database_name/name_based_path now Option; equality/hash on account+container RID; sub-resource links use base_path(); new unit tests.
models/cosmos_resource_reference.rs Adds rid_based to ResourcePaths, RID signing override, is_rid_addressed, encode_path_segments, debug-only mixed-addressing checks, and extensive path/signing tests.
models/mod.rs Exports encode_path_segments to the driver crate.
driver/pipeline/operation_pipeline.rs Sends RID-based paths raw and percent-encodes name-based paths via encode_path_segments.
driver/routing/session_container.rs Keys the session-token index on base_path() so RID- and name-addressed containers index consistently.
driver/cache/container_cache.rs Minimal compile shim for the Option-returning database_name() (uses unwrap_or_default(); test asserts wrapped in Some(...)).
CHANGELOG.md Adds 0.6.0 (Unreleased) Bugs Fixed entry describing RID signing/routing fix.

A couple of points to highlight for the human reviewer: offers now percent-encode their base64 = padding (previously sent raw via Url::set_path), which by the PR's own RID/is_name_based=false rationale could regress read_offer/replace_offer and is only verifiable against a live/emulator account; and the central raw-vs-encoded pipeline branch lacks a build_transport_request test despite that module's strong path-assertion test conventions. This is the epic's explicitly "highest-risk" protocol-correctness work and touches request signing.

Comment thread sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_resource_reference.rs Outdated
Slice 1 of 3 (stacked, epic #4637) of the Cosmos RID-addressing work. This slice is driver-internal only and has no public API surface.

Add RID-capable ContainerReference (new_by_rid, base_path, is_by_rid; database_name/name_based_path now return Option) and the raw-path / lowercased-RID signing protocol in CosmosResourceReference (compute_paths, rid_signing_override, is_rid_addressed, ResourcePaths::is_rid_based, encode_path_segments). build_transport_request sends RID-based paths raw and percent-encodes name-based paths; session_container indexes on base_path.

validate_addressing and the CLIENT_MIXED_NAME_RID_ADDRESSING status are intentionally deferred to Slice 2; addressing_conflict/parent_chain_is_rid are debug-only here. A minimal compile shim in container_cache.rs adapts to the new Option-returning database_name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simorenoh simorenoh force-pushed the simorenoh/cosmos-rid-slice1-driver-protocol branch from 9317b7f to 357f0db Compare June 22, 2026 20:48
simorenoh and others added 4 commits June 22, 2026 18:30
Address review findings on Slice 1 of the RID-addressing series:

- Offers are signed over the lowercased RID (is_name_based=false), so
  their /offers/{rid} path must be sent raw. The offer branch produced
  rid_based=false, which percent-encoded reserved base64 characters in
  the RID and caused 401s for any offer whose RID contained +, /, or =
  padding. Flip the offer branch to rid_based=true so offers route raw,
  matching main's proven behavior. Verified against a live account: a
  raw RID read returns 200 while the same percent-encoded RID returns
  401.
- Add build_transport_request tests asserting the final Url::path():
  RID-addressed paths (incl. = padding and +) stay raw, name-addressed
  paths are percent-encoded, and offers route raw.
- Correct the encode_path_segments doc: RIDs are not URL-safe base64;
  they use a variant that maps / to - (so splitting on / is safe) but
  can still contain + and = padding, which is why RID paths go raw.
- Add a CHANGELOG Breaking Changes entry for the resource-reference
  accessors that now return Option, and note offers now route raw.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The RID test-fixture words (balxjyz, QBAA, QBAOW) were added to the
repo-wide .vscode/cspell.json. They are Cosmos-specific, so move them to
sdk/cosmos/.cspell.json (which imports the root config) to keep the
repo-wide dictionary free of service-specific noise.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Non-functional polish from the Slice 1 re-review (no behavior change):

- Rename session_container's name_path helper to index_path: it now
  returns base_path(), which is the RID-based path for RID-addressed
  containers, so name_path was misleading.
- Reword is_rid_addressed doc to state only the guaranteed direction of
  the routing/signing invariant (RID-signed implies raw-routed); the
  converse does not hold, so the two helpers must not be derived from
  each other.
- Call debug_assert_addressing_consistent in compute_feed_paths for
  symmetry with compute_paths (debug-only).
- Fix an encode_path_segments doc remnant that said the function splits
  on '/'; it iterates bytes and special-cases '/'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the latent by-name cache-key collision at the Slice-1/2 shim: database_name() is None for RID-addressed containers, so unwrap_or_default collapses the key to {endpoint, empty, name}. Harmless today (nothing emits RID references yet); Slice 2's resolution layer must handle it before caching RID-addressed containers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@Pilchie Pilchie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would like another set of 👀 on it too.

@FabianMeiswinkel FabianMeiswinkel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants