Skip to content

Cosmos: Add ContainerClient::patch_item via driver-side RMW#4386

Open
NaluTripician wants to merge 19 commits into
Azure:release/azure_data_cosmos-previewsfrom
NaluTripician:feature/rust-patch-rmw
Open

Cosmos: Add ContainerClient::patch_item via driver-side RMW#4386
NaluTripician wants to merge 19 commits into
Azure:release/azure_data_cosmos-previewsfrom
NaluTripician:feature/rust-patch-rmw

Conversation

@NaluTripician
Copy link
Copy Markdown

@NaluTripician NaluTripician commented May 11, 2026

Fixes #4048.

Summary

Adds ContainerClient::patch_item to azure_data_cosmos, restoring the only major write-side gap between the Rust Cosmos SDK and its peer SDKs (.NET / Java / Python / JS). PATCH is implemented as a driver-side Read-Modify-Write (RMW) loop behind a new OperationType::Patch, so any future SDK that consumes azure_data_cosmos_driver inherits PATCH for free and so the driver can later swap in native HTTP PATCH transparently without an SDK API change.

PATCH was previously removed in #3764 pending an idempotency model for retry. This PR ships RMW that is byte-for-byte equivalent to the backend evaluator for the supported op set; idempotency caveats for Increment under automatic retry are documented in both the rustdoc and PATCH_HANDLER_SPEC.md.

Approach

The RMW loop lives in azure_data_cosmos_driver behind a new OperationType::Patch. The SDK is a ~90 LOC wrapper around CosmosOperation::patch_item + ContainerClient::patch_item. Four new driver modules:

  • models/patch.rsPatchOp (Add / Remove / Replace / Set / Increment / MoveOp), IncrValue, PatchSpec. Both enums are #[non_exhaustive] for forward-compat. Provides both idiomatic Rust enum variants AND factory methods that mirror .NET (PatchOp::add(...), PatchOp::set(...), PatchOp::increment(...), PatchOp::move_op(...)).
  • driver/pipeline/patch_eval.rs — RFC 6901 JSON Pointer evaluator + per-op semantics matching the Cosmos backend evaluator. Hand-rolled (see Alternatives). 27 unit tests covering:
    • RFC 6901 escape grammar (~0~, ~1/)
    • Token-level descendant detection for move_op (not byte-prefix)
    • Atomic move via clone-stage-commit (failure leaves doc unchanged)
    • Set on numeric array index replaces in place (Cosmos backend semantics); Set with /- appends
    • Empty-array guard on parse_array_index (no panic on Set('/xs/0') against {xs: []})
  • driver/pipeline/patch_handler.rs — the bounded RMW loop:
    1. Pre-flight PK guard — rejects any op whose path equals or is an ancestor of any partition-key path, including the move-FROM source. Handles hierarchical PK (1–3 paths). Runs once before the loop.
    2. Internal OperationType::Read — uses the caller's OperationOptions (preserves caller's session_token + consistency level + latency budget). Extracts ETag + session_token from the response headers.
    3. No-ETag guardNone ETag → Err immediately, before any Replace is issued.
    4. apply_patch_ops on the deserialized body.
    5. Internal OperationType::Replace with Precondition::IfMatch(etag) and the Read response's session_token (overriding caller's) — closes the TOCTOU window so the Replace commits against the same replica view that produced the Read.
    6. On Err(412) and attempt < max_attempts - 1: continue. On any other error: propagate (typed via ErrorKind::HttpResponse { status, .. } so callers can downcast a 404 / 429 / 503 from inside the RMW). On exhaustion: ErrorKind::HttpResponse { status: PreconditionFailed, .. } with message "patch_item: ETag conflict after {N} attempts", chained to the last per-attempt 412 via Error::with_error.
  • driver/pipeline/from_local_body.rs — synthesizes the response the SDK bridge consumes (body from the locally-patched JSON; headers from the successful Replace).

The RMW loop is the first and only code path in the driver pipeline permitted to deserialize a response body to serde_json::Value. The existing Upsert/Batch dispatch branches inject request headers only; they do not touch the response body. The body-opacity exception is documented in ARCHITECTURE.md and docs/PATCH_HANDLER_SPEC.md.

Call flow

  1. SDK caller builds a PatchSpec (vec of PatchOps, optionally via the .NET-style factories) and calls container.patch_item::<T>(pk, item_id, spec, options).
  2. SDK wrapper builds CosmosOperation::patch_item(item_ref).with_body(spec) (+ optional with_patch_max_attempts from PatchItemOptions::max_attempts) and forwards to the driver via the existing execute_operation pipeline. No loop, no body deserialization in the SDK.
  3. Driver's execute_operation_pipeline detects OperationType::Patch and dispatches to patch_handler::execute(...) before the existing Upsert/Batch header-injection branches.
  4. patch_handler::execute runs the PK pre-flight, then the bounded RMW loop. Internal Read/Replace operations are re-entrant through the same pipeline (the dispatch branch fires only for top-level Patch).
  5. The handler returns a CosmosResponse synthesized via from_local_body_and_driver_headers — body from the locally-patched JSON, headers from the successful Replace.
  6. The SDK unwraps to ItemResponse<T> via the existing driver_response_to_cosmos_response bridge; callers invoke .into_model() to deserialize the synthesized post-image into their type.

Testing

  • 27 unit tests on apply_patch_ops + JSON Pointer (RFC 6901 escapes, fail-fast on first invalid op, atomic move, set-vs-add semantics, empty-array guard, token-level descendant detection).
  • patch_handler unit testsbuild_read_sub_op + build_replace_sub_op helper-level coverage of session-token propagation, ETag threading, If-Match precondition presence, typed error kinds, plus the 412-detection predicate.
  • Emulator test bodies — real test bodies (not stubs):
    • cosmos_patch_basic_set — Set roundtrip with post-image verified via the synthesized PATCH response and a follow-up Read.
    • cosmos_patch_pk_guard + cosmos_patch_pk_guard_hierarchical — preflight rejection on flat + MultiHash containers, no network call issued, including move-FROM-PK-path coverage.
    • cosmos_patch_cross_sdk_parity — 34-row fixture catalog whose expected behavior is derived from the equivalent .NET (ItemPatchSuccessTest, ItemPatchFailureTest, PatchOperationTests, PatchOperationTTests) and Java (PatchAsyncTest) tests, plus Rust-derived RFC 6901/6902 + i64-fidelity edge rows. Coverage rollup in tests/emulator_tests/cosmos_patch_cross_sdk_parity_coverage.md.
  • DriverTestClient::patch_item + hierarchical-PK container helpers test infrastructure.
  • _assert_futures_are_send extended for ContainerClient::patch_item (compile-time Send-bound regression guard).
  • cargo build / cargo clippy -D warnings / cargo fmt --check on azure_data_cosmos + azure_data_cosmos_driver.
  • cargo test -p azure_data_cosmos_driver --lib — 912 passed, 0 failed.
  • ⏭️ Deferred emulator tests (A7 / A8 / A12 / A13) — present, #[ignore]'d with unimplemented!() bodies citing the blocking fault-injection / latency-policy primitives. See "Deferred to follow-ups" below.

Risk Assessment

Dimension Value
Breaking change No
Behavior change Additive — new public API only
Performance impact Negative — RMW is 2+RTTs vs. native PATCH's 1RTT (documented in PatchItemOptions rustdoc # Latency section + PATCH_HANDLER_SPEC.md)
Test coverage delta +36 unit tests, +3 real emulator test bodies, +34-row cross-SDK parity fixture catalog, +4 deferred-emulator-with-unimplemented!()-guard, +1 Send-future compile-time test
Public API added ContainerClient::patch_item<T>, PatchItemOptions, PatchSpec / PatchOp / IncrValue (re-exported from driver; both #[non_exhaustive])
Public API removed None

Notable side effects, each with documented mitigation:

  • ETag conflict storm under hot-item contention — mitigated by bounded max_attempts, documented as not-for-hot-items in the rustdoc.
  • Session-token TOCTOU across Read → Replace — closed by threading the Read response's session_token into the Replace (caller's token is preserved only for the Read); helper-level regression tests added.
  • E2E latency policy 2+RTT lower bound — documented in PatchItemOptions rustdoc and PATCH_HANDLER_SPEC.md.

Alternatives Considered

Alt Approach Outcome
A SDK-side RMW Rejected. Driver-side RMW means any future SDK that consumes the driver inherits PATCH for free; SDK-side would duplicate the loop into every future SDK.
B Native HTTP PATCH wire path with backend evaluation Deferred. Out of scope for v1. RMW v1 ships first; the future native-PATCH swap is non-breaking under the chosen API signature.
C Driver-side RMW + opt-in native PATCH mode Deferred. Doubles the test matrix without a v1 customer ask.
D Vetted JSON Pointer crate (jsonptr) Reconsidered, hand-roll retained. 27 unit tests cover the RFC 6901 surface; Cosmos's Add-vs-Set + atomic-move + empty-array semantics would each need a wrapper layer around any 3p crate. Follow-up trigger: RFC 6902 Test op or relativized pointers.
E max_attempts: u8 with 0 → default 5 Rejected. Silent default behavior is a footgun. Option<NonZeroU8> makes "fail-fast on first 412" (Some(NonZeroU8::new(1))) explicit.
F Treat body-opacity as a parallel to Upsert/Batch Rejected. Upsert/Batch only inject request headers; they do NOT deserialize response bodies. PATCH is the first body-opacity exception.

Cross-SDK Impact

PATCH on Rust lands in three carefully-scoped relationships to the other Cosmos SDKs:

  • Public API surface mirrors the peer SDKs. PatchOp / PatchSpec / PatchItemOptions are named, shaped, and casing-aligned with Microsoft.Azure.Cosmos.PatchOperation (.NET) and CosmosPatchOperations (Java). The factory methods (PatchOp::add, PatchOp::set, PatchOp::increment, PatchOp::move_op) match the .NET static factories one-to-one. Callers porting code between SDKs see a familiar surface.
  • Wire protocol is intentionally different. Peer SDKs send native HTTP PATCH to the Cosmos backend. Rust sends a Read then a conditional Replace. The observable evaluation semantics (operator behavior, RFC 6901 pointer handling, partition-key guard) are designed to match what the .NET/Java tests observe today — that match is what cosmos_patch_cross_sdk_parity verifies.
  • Future evolution is non-breaking. When native PATCH becomes viable for Rust (idempotency model + backend support), the swap is internal to the driver. The public surface stays as-is.

This PR is intentionally Rust-only. No code change required in .NET / Java / Python / JS (they already ship PATCH); Go is a separate work item.

Review Loop Resolution

Two rounds of automated and manual review ran on this PR. The fixes that landed:

First-round changes:

  • patch_item now returns ItemResponse<T> instead of ItemResponse<()>, matching the read-item / create-item shape.
  • RMW handler threads the Read response's session_token into the Replace; the caller's session_token is still honored on the Read. Helper-level tests guard the threading.
  • The hand-rolled JSON Pointer evaluator stays (see Alternatives D), and the per-op surface is fully unit-tested.
  • PatchOp and IncrValue are both #[non_exhaustive].
  • Descendant detection runs at the JSON-Pointer token level (not byte-prefix), so /a is not a descendant of /ab.
  • Atomic move uses clone-stage-commit so a failed move leaves the doc unchanged.
  • Set on a numeric array index replaces in place (matching the backend); Set with /- appends.
  • PatchItemOptions::max_attempts rustdoc gained a # Latency section. PATCH_HANDLER_SPEC.md documents the 2+RTT lower bound.
  • Typed exhaustion error with the underlying 412 chained as Error::with_error.
  • Deferred emulator tests are #[ignore]'d AND call unimplemented!() so removing the ignore without wiring the test body fails loud.

Second-round changes:

  • parse_array_index empty-array guard — Set('/xs/0') against {xs: []} now surfaces ArrayIndexOutOfRange instead of panicking.
  • read_status_error now uses ErrorKind::HttpResponse for symmetry with the exhaustion error contract.
  • Test now asserts Precondition::if_match(etag) is present on the Replace, not just the session-token threading.

Copilot-review changes (commits in this PR):

  • The RMW loop's 412 detection now runs on Err(_) instead of Ok(_) — the driver pipeline returns 412 as Err(ErrorKind::HttpResponse{status: PreconditionFailed, ..}), not as a successful response with a status code. A is_precondition_failed predicate was added and the Replace path now uses match instead of ? so the loop can detect-and-continue on a 412 error.
  • The move-FROM-PK-path branch of the preflight guard was added (previously only the move-TO case was caught).
  • Rustdoc cleanup on PatchItemOptions::session_token, models/patch.rs examples, PATCH_HANDLER_SPEC.md algorithm description, create_container_with_pk_paths, and CHANGELOG entries.

Deferred to follow-ups

These are documented deferrals, not silent gaps:

  1. Fault-injection primitives — A7 / A8 require FaultInjectionErrorType::PreconditionFailed(412); A12 requires a header-strip primitive; A13 requires EndToEndOperationLatencyPolicy plumbing through patch_handler. The four deferred emulator tests are present, #[ignore]'d, and their bodies call unimplemented!("<id> deferred: needs <primitive>") so they fail loud if someone removes the ignore without doing the work.
  2. Typed PatchHandlerOptions side-channel — today max_attempts rides on CosmosOperation::with_patch_max_attempts(). A typed per-op handler-options struct accessible from CosmosOperation would scale better as more PATCH-only knobs land. The next PATCH-specific option is the forcing function.

Lineage

  • Cosmos: Remove PATCH API temporarily #3764 removed the original PATCH surface pending an idempotency model.
  • Cosmos: Restore PATCH API #4048 is the tracking issue this PR closes (Cosmos: Restore PATCH API).
  • Cross-SDK precedentsMicrosoft.Azure.Cosmos.PatchOperation (.NET v3 SDK), CosmosPatchOperations (Java v4 SDK). Public API naming and factory shape were aligned to these.
  • RFC 6901 (JSON Pointer) — pointer syntax + escape grammar for ~0 / ~1.
  • RFC 6902 (JSON Patch) — Add / Remove / Replace / Move op semantics, including the - (append) array index.
  • Driver pipelineOperationType::Patch slots into the existing dispatch model alongside Upsert/Batch; the body-opacity exception for PATCH is documented in ARCHITECTURE.md and docs/PATCH_HANDLER_SPEC.md.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

NaluTripician and others added 4 commits May 11, 2026 11:02
Logic Spren stages per-work-item handoff JSONs and a handoff README under
.coding-harness/ in the working tree as part of the Phase 1 -> Phase 2
bridge to the Coding Agent Harness. These files are intentionally local
to the implementer's checkout and must never enter the PR diff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement PATCH driver-side as a Read-Modify-Write loop behind a new
`OperationType::Patch` variant. The handler is the only code path in the
driver allowed to deserialize a data plane response body — every other
pipeline stage continues to treat payloads as opaque `Vec<u8>`.

Added:

- `PatchOp`, `IncrValue`, `PatchSpec` model types (add/set/replace/
  remove/increment/move ops). `IncrValue` preserves i64 fidelity.
- `CosmosOperation::patch_item()` + `with_patch_max_attempts()` setter
  so SDK wrappers can cap the RMW loop (default 5).
- `driver::pipeline::patch_eval` — RFC 6901 JSON Pointer evaluator
  hand-rolled to avoid an external dependency (~440 LOC + 19 unit tests).
- `driver::pipeline::patch_handler` — RMW loop with partition-key path
  pre-flight validation, ETag capture, `If-Match`-guarded Replace, and
  retry on 412 PreconditionFailed.
- `driver::pipeline::from_local_body` — single helper that synthesizes
  the final `CosmosResponse` from the locally-merged body plus the
  Replace's transport metadata.
- `CosmosDriver::execute_operation` early-dispatches PATCH ops via
  `Box::pin` (the handler re-enters `execute_operation` for its
  internal Read/Replace pair).

Docs:

- `ARCHITECTURE.md` documents the body-opacity exception scoped to the
  patch handler.
- `docs/PATCH_HANDLER_SPEC.md` describes the full algorithm, error
  surface, and invariants.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the idiomatic SDK wrapper around `OperationType::Patch`. Mirrors the
shape of `replace_item` but accepts a `PatchSpec` (a list of
add/set/replace/remove/increment/move ops) and a dedicated
`PatchItemOptions` so callers can cap the driver's RMW retry budget.

Notes:

- `patch_item` never exposes `Precondition` — the handler manages
  `If-Match` internally. Only `session_token` and `operation` are
  plumbed through `apply_item_options`.
- `PatchSpec`, `PatchOp`, `IncrValue` are re-exported at the crate
  root (and through `models`) so callers don't have to reach into the
  driver crate.
- `PatchItemOptions::max_attempts` is `Option<NonZeroU8>` to avoid the
  `u8 == 0` footgun.
- README hero example shows a basic `set` + `increment` PATCH.
- CHANGELOG entry under 0.34.0 Features Added.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stub out emulator tests for scenarios A5/A6/A7/A8/A9/A12/A13 from the
original RMW spec. All are `#[ignore]`'d because the driver test
harness does not yet expose a high-level `patch_item` helper; wiring
them up is tracked in the harness's implementation-state manifest as
follow-up work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NaluTripician and others added 9 commits May 11, 2026 12:11
Extends DriverTestClient with two helpers needed by the Phase 4 PATCH

emulator tests:

* `create_container_with_pk_paths` lets a test create a hierarchical

  partition-key container (MultiHash). The pre-existing

  `create_container` is now a thin wrapper that delegates with a single

  PK path.

* `patch_item` mirrors `read_item` but builds the operation body from

  a `PatchSpec` and forwards an optional `patch_max_attempts` override.

  The synthetic response from the RMW handler is returned verbatim so

  callers can assert on the locally-merged post-image.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrites driver_patch.rs from Phase 3 `Ok(())` stubs into real

emulator-backed bodies for the canonical Logic Spren A-mapping:

* **A5** `cosmos_patch_basic_set` -- Set(/deleted, true) round-trips

  through the RMW handler and is observed in both the synthetic patch

  response and a follow-up read.

* **A6** `cosmos_patch_pk_guard` + `cosmos_patch_pk_guard_hierarchical`

  -- preflight rejects Set/Replace/Remove/Add/Move on partition-key

  paths for both flat and hierarchical containers, never issuing a

  network call.

* **A9** `cosmos_patch_compare` -- iterates a 34-row

  `PatchCompareCase` catalog (traceable to .NET ItemPatchSuccessTest /

  ItemPatchFailureTest / PatchOperationTests, Java

  itemPatchSuccessForNullValue, plus rust-derived RFC 6901/6902 + i64

  fidelity rows) and asserts each post-image or error substring.

A7/A8/A12/A13 stay `#[ignore]` with explicit TODOs citing the missing

fault-injection primitive (412 fault type, response-header strip,

EndToEndOperationLatencyPolicy plumbing). The coverage rollup lives in

`cosmos_patch_compare_coverage.md`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the `_assert_futures_are_send` compile-time backstop used elsewhere

in the cosmos crates so future regressions that accidentally drop the

`Send` bound on the `patch_item` future (e.g. by capturing an `Rc`

or `RefCell` across an `.await`) fail at `cargo build` time instead

of surfacing as spawn-time runtime errors in caller code.

The body is never executed -- the `todo!()` placeholders are unreachable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the placeholder `cosmos-patch-ref` plan with a concrete

contract for the deferred .NET PATCH oracle:

* stdin/stdout JSON schema (compatible with `PatchCompareCase` rows),

* error-kind taxonomy required for cross-SDK matching,

* harness wiring (`tokio::process`, `COSMOS_PATCH_REF_BIN` env,

  Test-Setup.ps1 append-only),

* acceptance checklist for the future PR that lands the binary.

The .NET helper itself remains deferred per .coding-harness/

implementation-state.json; only the README ships.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address Phase 5 review findings F1, F2, F10 on PR Azure#4386.

F1 — typed PATCH return body:
  ContainerClient::patch_item is now generic over the response model
  (`patch_item<T>(...) -> ItemResponse<T>`), matching the read_item<T>
  pattern. The T: DeserializeOwned bound only kicks in at .into_model(),
  so callers that don't care about the body stay zero-cost. The rustdoc
  hero example and the README example were rewritten to deserialize the
  synthesized post-image into a user struct via .into_model(). A new
  `# Response Body` section calls out that the body is always
  synthesized locally — the SDK does not rely on the server's
  `content_response_on_write` setting.

F2 — session-token RMW (SE-004 mitigation):
  The PATCH RMW loop now (a) captures the caller's session token once
  before the loop and threads it through the internal Read so the read
  sees a session-consistent view, then (b) overrides that token with
  the Read response's session token on the internal Replace so the
  write commits against the same replica we just read from. The Read
  also forwards the caller's OperationOptions (consistency level,
  latency budget, throttling settings) instead of defaulting them,
  matching what an explicit Read by the caller would have done.

  The sub-op construction was extracted into `build_read_sub_op` /
  `build_replace_sub_op` and is covered by four new unit tests.

F10 — exhaustion error chains the last 412:
  After `max_attempts` 412s, patch_item now returns
  `ErrorKind::HttpResponse { status: PreconditionFailed, .. }` with
  a custom message (`"patch_item: ETag conflict after N attempts"`)
  and `Error::with_error` chains the per-attempt 412 as the source
  so callers can downcast or trace it.

Validated locally: cargo fmt --check, cargo build, cargo clippy
`-D warnings` (lib + tests), cargo test `-p azure_data_cosmos_driver
--lib` — 794 passed (baseline 790 + 4 new patch_handler tests).

Refs PR Azure#4386 review feedback (iteration 1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address Phase 5 review findings F4, F5, F6, F7 on PR Azure#4386.

F4 — token-level descendant check in `move_op`:
  Compare parsed JSON-Pointer tokens instead of raw byte prefixes when
  rejecting `move` into a descendant. The byte-prefix form happens to
  be correct for canonical RFC 6901 encodings (only '/' and '~' are
  escaped), but it relied on a fragile trailing-'/' guard. The new
  token-slice compare makes the invariant explicit and immune to any
  future encoding edge cases.

F5 — atomic `move_op` (clone-stage-commit):
  The previous `move_op` ran `remove` directly on the live document
  before `add_or_set`. If the destination's parent was missing, the
  source was already gone and the caller observed a partial state.
  We now stage both steps on a clone and only commit `*doc = staged`
  on success — a failed move leaves the document unchanged.

F6 — `#[non_exhaustive]` on `PatchOp` and `IncrValue`:
  Both enums are pre-1.0 and may need to grow (e.g. when Cosmos ships
  the conditional-PATCH server feature, or for new increment variants).
  Marking them `#[non_exhaustive]` reserves forward-compatible
  headroom without affecting internal exhaustive matches.

F7 — `Set` on a numeric array index replaces in place:
  Backend Cosmos PATCH semantics: "set is similar to add… in an array,
  Set replaces an existing element value". The previous `add_or_set`
  treated Set identically to Add for arrays (insert+shift). We now
  branch: numeric-token Set ⇒ `arr[idx] = value` (no shift, fails on
  out-of-range), trailing-'-' Set ⇒ append (matches Add), Add ⇒
  insert+shift as before. Three new unit tests cover the matrix
  (replace, append, out-of-range).

8 new patch_eval unit tests (27 total, was 19) + 0 driver lib
regressions. Validated locally: cargo fmt, cargo build, cargo clippy
`-D warnings` (lib + tests), cargo test `-p azure_data_cosmos_driver
--lib` — 800 passed.

Refs PR Azure#4386 review feedback (iteration 1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address Phase 5 review findings F8, F12, F13 (and document the Q3 / F3
override + F11 deferral in the harness state).

F8 — # Latency rustdoc on PatchItemOptions:
  Calls out the 2× single-RTT floor (PATCH is at minimum Read+Replace)
  and recommends sizing the end-to-end latency budget at >= 2x the p99
  budget for a plain Replace, plus headroom for 412 retries. Without
  this guidance a too-tight budget can cancel the RMW between the Read
  and the Replace and produce a timeout error even on a healthy
  service.

F12 — unimplemented!() in deferred emulator test bodies:
  The four deferred PATCH emulator tests (cosmos_patch_412_retry,
  cosmos_patch_412_exhaustion, cosmos_patch_no_etag_returns_error,
  cosmos_patch_e2e_latency_budget) previously returned Ok(()) — which
  would silently certify behavior we have not exercised the moment
  someone removes the `#[ignore]`. They now call unimplemented!()
  with citations to the matching `implementation-state.json#known_issues`
  rows (fault-injection-412, fault-injection-header-strip,
  e2e-latency-policy). The `#[ignore]` attributes are unchanged.

F13 — TODO(A9-deferred) inline in cosmos_patch_compare:
  The A9 fixture catalog's expected_props are hand-curated against the
  documented Cosmos PATCH semantics. A parallel run against the .NET
  SDK would catch silent semantic drift. Added a TODO(A9-deferred)
  comment immediately above `let cases = fixtures();` anchoring the
  deferred work to the exact site that needs the swap. Complements the
  broader contract at `tests/tools/cosmos-patch-ref/README.md`.

Q3 / F3 override and F11 deferral are recorded in the harness state
(`.coding-harness/implementation-state.json`, gitignored) — see
`.coding-harness/feedback-response-1.json` for the full disposition
table.

Validated locally: cargo fmt --check, cargo build, cargo clippy
`-D warnings` (lib + tests), cargo test `-p azure_data_cosmos_driver
--lib` — 800 passed, `cargo test -p azure_data_cosmos --lib` — 133
passed.

Refs PR Azure#4386 review feedback (iteration 1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
F14: `parse_array_index(..., allow_append=false)` previously let `idx=0`
slip through on an empty array (because `len.saturating_sub(1) = 0`),
which made `Set('/xs/0', v)` and `Replace('/xs/0', v)` against `{xs: []}`
panic at the subsequent `arr[idx] = v`. Both call sites are reachable
from `ContainerClient::patch_item`, so the panic surfaces to user code
as a tokio JoinError. Guard the empty-array case explicitly and return
the typed `ArrayIndexOutOfRange` instead.

F15: `read_status_error` returned `ErrorKind::Other` for any non-success
status surfacing from the internal Read sub-op (and from the non-412
Replace path), while iter-1's F10 fix typed the 412 path as
`ErrorKind::HttpResponse`. Callers could downcast a 412 via
`error.http_status()` but had to string-match on a 404 / 429 / 5xx.
Mirror F10 — use `ErrorKind::HttpResponse { status, .. }` here too so
the error contract is symmetric across all PATCH failure modes.

F16: the iter-1 helper-level test `replace_sub_op_uses_read_response_session_token`
asserted operation type, body, and session_token but not the If-Match
precondition. A future refactor that silently dropped
`.with_precondition(Precondition::if_match(etag))` would have downgraded
the RMW to a non-conditional Replace, defeating R3-DRIVER's ETag guard.
Add the assertion via the existing `op.precondition()` accessor.

Tests:
* +4 patch_eval tests (Set/Replace on `/xs/0` against `{xs: []}` return
  ArrayIndexOutOfRange; Add on `/xs/0` and `/xs/-` against `{xs: []}`
  still succeed).
* +1 patch_handler test (`read_status_error` returns
  `ErrorKind::HttpResponse { status: NotFound, .. }`).
* +1 assertion on the existing F16 test.

Driver lib tests: 805 passed (was 800). SDK lib tests: 133 passed.
All canonical commands (fmt, build, clippy, test for both crates) green.

Addresses review-feedback-2.json findings: F14, F15, F16

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ADO Build Analyze stage flagged 3 unknown words and 4 relative links to
source files. None of these are real bugs but the gates are blocking.

CSpell:
- .gitignore: drop "Spren" from the harness comment (irrelevant detail).
- driver_patch.rs: change "nonexistentparent" needle to "nonExistentParent"
  so cspell tokenizes it as non+Existent+Parent. The needle match is
  case-insensitive (cosmos_patch_compare lowercases both sides at line
  825) so this preserves the assertion semantics.
- tests/tools/cosmos-patch-ref/README.md: canonicalising -> canonicalizing
  (cspell uses en-US).

Link verify:
- PATCH_HANDLER_SPEC.md: two markdown links pointed at .rs files. The
  Azure SDK link guidance disallows relative links outside .github/** and
  eng/** and the link verifier flags them. Replace with inline code paths
  that still let readers locate the file.
- cosmos_patch_compare_coverage.md: same treatment for the link to
  driver_patch.rs and to tools/cosmos-patch-ref/README.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NaluTripician NaluTripician marked this pull request as ready for review May 11, 2026 21:06
Copilot AI review requested due to automatic review settings May 11, 2026 21:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Cosmos DB item PATCH support to the Rust SDK by introducing a new ContainerClient::patch_item API in azure_data_cosmos, backed by a driver-side OperationType::Patch implemented as a bounded Read‑Modify‑Write (RMW) loop in azure_data_cosmos_driver.

Changes:

  • Added new public PATCH surface in azure_data_cosmos: ContainerClient::patch_item<T> and PatchItemOptions, plus re-exports of PatchSpec/PatchOp/IncrValue.
  • Added driver-side PATCH implementation: OperationType::Patch, PATCH operation factory/options plumbing, local RFC 6901 evaluator, and an RMW handler that synthesizes a post-image response.
  • Added emulator-backed E2E tests and documentation/spec artifacts for the PATCH behavior and future .NET oracle tooling.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
sdk/cosmos/azure_data_cosmos/src/options/mod.rs Adds PatchItemOptions for configuring SDK PATCH requests.
sdk/cosmos/azure_data_cosmos/src/models/mod.rs Re-exports driver PATCH model types from the SDK models module.
sdk/cosmos/azure_data_cosmos/src/lib.rs Re-exports PATCH model types at the crate root.
sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs Adds ContainerClient::patch_item<T> and a compile-time Send future assertion.
sdk/cosmos/azure_data_cosmos/README.md Adds a README example showing item PATCH usage.
sdk/cosmos/azure_data_cosmos/CHANGELOG.md Documents the new SDK PATCH API addition.
sdk/cosmos/azure_data_cosmos_driver/tests/tools/cosmos-patch-ref/README.md Adds a deferred contract/spec for a future .NET PATCH oracle tool.
sdk/cosmos/azure_data_cosmos_driver/tests/framework/test_client.rs Adds test helpers for hierarchical PK containers and driver PATCH dispatch.
sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/mod.rs Registers the new emulator test module for PATCH.
sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_patch.rs Adds emulator-backed PATCH E2E tests and fixture-based comparison harness.
sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/cosmos_patch_compare_coverage.md Documents coverage of the PATCH comparison fixture catalog.
sdk/cosmos/azure_data_cosmos_driver/src/models/patch.rs Introduces PATCH model types: PatchOp, PatchSpec, IncrValue.
sdk/cosmos/azure_data_cosmos_driver/src/models/mod.rs Wires PATCH models and adds OperationType::Patch.
sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_resource_reference.rs Adds helper to reconstruct ItemReference for PATCH handler sub-ops.
sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_operation.rs Adds PATCH operation factory and patch_max_attempts plumbing.
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/patch_handler.rs Adds the driver-side PATCH RMW handler.
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/patch_eval.rs Adds the local PATCH evaluator (RFC 6901 + Cosmos semantics).
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/mod.rs Exposes new PATCH pipeline modules.
sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/from_local_body.rs Adds helper to synthesize a response from local body + Replace headers.
sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs Dispatches OperationType::Patch to the PATCH handler.
sdk/cosmos/azure_data_cosmos_driver/docs/PATCH_HANDLER_SPEC.md Adds a spec doc for the PATCH handler behavior.
sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md Documents the new driver PATCH operation and models.
sdk/cosmos/azure_data_cosmos_driver/ARCHITECTURE.md Documents the “body-opacity exception” for PATCH in the driver architecture.
.gitignore Ignores .coding-harness/ local artifacts directory.

Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/patch_handler.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos/src/options/mod.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/src/models/patch.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/docs/PATCH_HANDLER_SPEC.md Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/tests/framework/test_client.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos/CHANGELOG.md Outdated
NaluTripician and others added 3 commits May 11, 2026 14:36
F-C1 is a real correctness bug: the RMW loop's 412 detection in
`patch_handler::execute` was unreachable. `driver.execute_operation`
returns `Err(ErrorKind::HttpResponse { status: 412, .. })` for a 412 —
the pipeline maps 412 to `OperationAction::Abort` and returns `Err`
verbatim (see `operation_pipeline.rs:324..358`). `await?` therefore
propagated the error out of the loop and the `status_code() ==
PreconditionFailed` arm was dead code, so a single concurrent writer
would surface as a 412 to the caller instead of triggering retry.

The fix uses `match driver.execute_operation(...).await` on the
Replace result. A new `is_precondition_failed(&Error)` predicate
pattern-matches `ErrorKind::HttpResponse { status: 412, .. }`; on a
match the loop stashes the real service error in `last_412` and
continues. Any other Err propagates verbatim, preserving the original
`raw_response` and diagnostics. The Read side already wanted
propagation, so its dead `if !status.is_success()` post-check is
removed in favour of plain `await?` (with a comment explaining why).

F-C2 extends the PK-overlap preflight to also reject `MoveOp { from }`
when `from` overlaps a partition-key path; moving *out of* a PK path
deletes a PK component just as surely as moving into one.

F-C7 adds emulator test rows for "Move from /pk" (flat) and "Move from
/tenantId" (hierarchical MultiHash) covering the F-C2 fix.

F-C3 trims the `PatchItemOptions` rustdoc so the session-token field
is described as the canonical location rather than via the operation
sub-field.

F-C4 fixes the doctest in `models/patch.rs`: the previous `windows(7)
.any(... == b"\"set\"")` could never match a 5-byte substring.

F-C5 rewrites the Algorithm block in `PATCH_HANDLER_SPEC.md` to
reflect that the Read uses caller `OperationOptions`+session_token and
the Replace overrides with the Read response's session_token (the
SE-004 mitigation landed in iter-1 F2). Adds a session-token-threading
section explaining the rationale.

F-C6 rewrites the `create_container_with_pk_paths` doc to drop the
incorrect "service infers kind" claim — the helper explicitly sets
Hash vs MultiHash from `paths.len()`.

F-C8 trims the patch_item CHANGELOG entries in both crates to a
single user-facing line and points readers at PATCH_HANDLER_SPEC.md
for the RMW / ETag / session-token / PK guard details.

Driver lib: 805 → 809 tests (+4 net: +3 `is_precondition_failed`
predicate tests, +2 `validate_partition_key_paths` move-from tests,
-1 obsolete `read_status_error_uses_http_response_kind` test whose
helper is no longer called from production code). SDK lib: 133
(unchanged).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build Analyze on the previous push surfaced two CI lint issues introduced

by the Copilot-review fix commit:

- cspell flagged `hpkc` (3 occurrences in `pk_guard_rejects_move_from_pk_path_hierarchical`) as an unknown word.

  Renamed the placeholder container id/name/rid to `multi_hash_container`

  variants so cspell tokenizes the snake_case parts as real English words.

- Link verification flagged the markdown link to `docs/PATCH_HANDLER_SPEC.md` in

  `azure_data_cosmos_driver/CHANGELOG.md` as a relative link in a directory

  outside the allow-list (only `.github/**` and `eng/**` may use relative

  links per Azure SDK link guidance). Converted to inline code so the path is

  still discoverable without violating the policy.

Local cspell pass against all changed files reports 0 issues; `pk_guard`

tests still pass after the rename.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `tests/tools/cosmos-patch-ref/` directory was a placeholder README
for a future .NET reference oracle that was never built. The Rust PATCH
tests already verify the same observable behavior as the equivalent
.NET and Java tests via a curated fixture catalog (each row carries a
`source_test_id` pointing at the upstream test method), so this stub
no longer adds value.

Changes:
- Delete `tests/tools/cosmos-patch-ref/` (README-only directory).
- Rename the catalog test `cosmos_patch_compare` to
  `cosmos_patch_cross_sdk_parity` so the name reflects what the test
  actually does: run a cross-SDK behavioral parity check against the
  emulator. Also rename the companion coverage doc to match.
- Rewrite the coverage doc to drop references to the deferred .NET
  oracle and to add a short note on the canonical way to re-verify a
  row (run the source .NET / Java test and compare).
- Strip internal review-finding prefixes (`F4:`, `F5:`, `F7:`, `F12:`,
  `F14:`, `F16:`, `F-C1:`, `F-C2:`) from inline comments while
  preserving the engineering rationale that followed.
- Drop `; see implementation-state.json#known_issues` trailers from
  deferred-test ignore messages and `unimplemented!` strings.

No behavioral changes; tests pass (912 driver lib tests, no new
ignores, no new failures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NaluTripician NaluTripician changed the title feat(cosmos): add ContainerClient::patch_item via driver-side RMW Cosmos: Add ContainerClient::patch_item via driver-side RMW May 11, 2026
NaluTripician and others added 2 commits May 11, 2026 16:48
Address the deep-reviewer test-coverage findings on the Cosmos PATCH
RMW handler:

* **Refactor**: extract `pub(crate) trait SubOperationDispatcher` and a
  generic `execute_with_dispatcher` core in `patch_handler.rs`. The
  original `execute()` is now a thin wrapper. Zero public-API change,
  but the loop is now unit-testable with a scripted stub without a
  live emulator.
* **Drop `PatchSpec::condition`**: the field and `with_condition`
  builder were a silent no-op (handler never read them) and structurally
  misplaced vs peer SDKs, which expose the SQL filter on request
  options. Removed both; a new test pins that the JSON wire form has no
  `condition` field.
* **New unit tests** (driver `patch_handler.rs`):
  * `rmw_recovers_from_412_on_first_replace` — pins the 412 continue
    branch (2 Reads, 2 Replaces, If-Match wired).
  * `rmw_propagates_412_after_exhausting_max_attempts` — exhaustion.
  * `rmw_propagates_non_412_replace_error_immediately` — no retry on
    non-412 Replace failure.
  * `rmw_propagates_read_error_immediately` — non-412 Read short-
    circuits before any Replace.
  * `pk_guard_rejection_issues_no_sub_operations` — zero sub-ops issued
    when the preflight rejects.
  * `empty_patch_spec_issues_no_sub_operations` — no-ops short-circuit.
  * `exhaustion_error_with_source_chains_underlying_412` /
    `_without_source_is_still_412_shaped` — pin both arms.
  * `rmw_fails_without_etag_before_replacing` — ETag-less Read aborts.
  * `rmw_caller_session_token_reaches_read_then_replace_uses_read_response_token`
    — session-token plumbing caller -> Read -> Replace.
* **New emulator test**: `cosmos_patch_read_missing_item_returns_not_found`
  in `driver_patch.rs` patches a never-created id and asserts a 404 from
  the read leg propagates as a typed error.
* **New SDK-level integration tests**: `cosmos_patch.rs`
  (`patch_item_round_trip`, `patch_item_missing_returns_not_found`,
  `patch_item_honors_max_attempts_option`) round-trip the public
  `ContainerClient::patch_item` surface end-to-end.

Validated: `cargo fmt --check`, `cargo clippy -D warnings` on the
driver (SDK clippy is clean modulo 3 pre-existing warnings in untouched
files), and `cargo test -p azure_data_cosmos_driver --lib --all-features`
-> 923 passed, 0 failed (12 new patch_handler tests + 1 new patch model
test).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 2 deep review surfaced two issues; close both.

* **B-1 (blocking)**: `patch_item_round_trip` in the new SDK
  integration test (`cosmos_patch.rs`) used JSON Pointer
  `/displayName` against a document whose serialized field is
  `display_name` (snake_case — no `serde(rename)` on the test struct).
  `PatchOp::Replace` is strict (see `patch_eval.rs:347-349` and
  `replace_missing_leaf_fails`), so the test would have failed with
  `MissingTarget` the first time it was actually run against the
  emulator. The `test_category = "emulator"` gate hid the failure
  from `cargo test`.

  Fix: change the pointer to `/display_name` so it lines up with the
  on-the-wire shape.

* **N-1 (nit)**: `rmw_caller_session_token_reaches_read_then_replace_uses_read_response_token`
  promised more than it actually checked — `DispatchedCall` records
  only `(op_type, if_match_etag)`, not session tokens. The
  session-token wire-up is already covered by the per-builder unit
  tests `read_sub_op_propagates_caller_session_token` and
  `replace_sub_op_uses_read_response_session_token`.

  Fix: rename to `rmw_loop_dispatches_read_then_etag_guarded_replace`
  and expand the doc comment to explain the test pins the structural
  Read -> Replace sequence + ETag inheritance only, with a pointer
  to the helpers that pin the session-token contract.

Verified: `cargo test -p azure_data_cosmos_driver --lib --all-features`
-> 923 passed, 0 failed. `cargo clippy -p azure_data_cosmos_driver
--lib --tests --all-features -- -D warnings` clean. `cargo fmt --check`
clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants