Skip to content

[Cosmos][Spec] Restore PATCH (Partial Document Update) for Rust SDK#4338

Closed
NaluTripician wants to merge 4 commits intoAzure:release/azure_data_cosmos-previewsfrom
NaluTripician:spec/issue-4048-patch-operations-previews
Closed

[Cosmos][Spec] Restore PATCH (Partial Document Update) for Rust SDK#4338
NaluTripician wants to merge 4 commits intoAzure:release/azure_data_cosmos-previewsfrom
NaluTripician:spec/issue-4048-patch-operations-previews

Conversation

@NaluTripician
Copy link
Copy Markdown

@NaluTripician NaluTripician commented May 4, 2026

Summary

Adds a design spec for restoring the Cosmos PATCH (Partial Document Update) API to the Rust SDK. The previous public surface was removed in PR #3765 because the original design did not safely interact with the Rust SDK's auto-retry-all-writes default — a transparent retry of a PATCH containing an increment or add operation could silently double-apply the change. This PR proposes a phase-1 design that ships PATCH only for the unconditionally-idempotent subset (set / replace / remove) and disallows add / increment / move at the SDK boundary, with a phase-2 follow-up reserved for non-idempotent operations via Read-Modify-Write.

This PR contains the spec only — no source code changes. The spec lives at sdk/cosmos/azure_data_cosmos_driver/docs/PATCH_OPERATIONS_SPEC.md, mirroring the location and format of the Feed Operations spec (PR #4327) that PATCH must align with.

Issue

Implements #4048.

Related: #3765 (the removal we are reversing); #4327 (Feed Operations Spec — the planner model PATCH must align with); #4156 (PPAF/PPCB landed the second write-retry pathway PATCH has to coexist with).

Root Cause / Background

Cosmos PATCH is conditionally idempotent. A patch body containing only set / replace / remove operations is safe to apply twice — applying the same body again yields the same final document. A patch body containing add / increment (incr on the wire) / move is not idempotent: a silent retry after the server has already applied the patch will double-apply the operation.

The Rust SDK retries all writes by default. There are at least four retry pathways in retry_evaluation.rs that interact with PATCH — and only two of them gate on is_idempotent():

  1. The standard HTTP retry-trigger gate at retry_evaluation.rs lines 406–408 — a three-way OR (is_read_only() || is_idempotent() || ppaf_write_retry_allowed).
  2. The transport-error retry path at retry_evaluation.rs lines 523–525 — same three-way OR.
  3. try_handle_server_error at retry_evaluation.rs:457 (5xx and 408 responses) — fires unconditionally; no is_idempotent() check at all. The reasoning encoded there is "an internal error in one region is unlikely to repeat in another," so the request is replayed cross-region.
  4. try_handle_write_forbidden at retry_evaluation.rs:289 (403 sub-status 3) — fires unconditionally; no is_idempotent() check at all.

A correct phase-1 design therefore cannot rely on flipping is_idempotent() to gate retry — the request body itself must be safe to replay under (3) and (4). PPAF (ppaf_write_retry_allowed) was introduced specifically to allow non-idempotent write retries cross-region, per the doc comment at pipeline/components.rs:92 — so reading the OR as "PPAF is gated by is_idempotent" inverts its design.

PR #3765 removed the public PATCH API rather than ship a design that handled both pathways correctly. The driver-side plumbing (Method::Patch in auth signing, FaultOperationType::PatchItem, the SDK-side OperationType::Patch legacy variant) was deliberately retained — there is even an explicit // PatchItem will be mapped when OperationType::Patch is added to the driver. TODO at azure_data_cosmos_driver/src/fault_injection/mod.rs:191 — because the team understood PATCH would return.

This spec settles the design.

Fix / Implementation

The proposed phase-1 design makes OperationType::Patch::is_idempotent() return true unconditionally, and makes that claim sound by narrowing the public API: PatchOperation is a #[non_exhaustive] enum exposing only Set / Replace / Remove, with no Deserialize impl. Customers cannot construct a non-idempotent patch because no public symbol exists by which they could express Add, Increment, or Move. The load-bearing property is document-level wire-body idempotency — every PatchDocument value that can ever be constructed produces a wire body that is safe to replay, which is the property required by all four retry pathways above (including the two that don't consult is_idempotent() at all).

The maintainer guardrail is an in-crate exhaustive match test over PatchOperation (no _ arm). #[non_exhaustive] does not suppress exhaustiveness checks inside the defining crate, so adding a new variant fails to compile that test, forcing a phase-2 redesign before any non-idempotent op can ship.

The full design — type model, options struct shape, driver wiring (OperationType::Patch variant, CosmosOperation::patch_item factory, fault-injection mapping), test plan, and rollout — is in the spec document.

Alternatives considered (full reasoning in §7 of the spec):

  • A. Read-Modify-Write loop for non-idempotent ops in phase 1. Rejected for phase 1 due to scope: introduces a second item-write codepath with its own retry budget, RU-cost story, and observability surface. RU consumption diverges from .NET. Reserved for phase 2; the #[non_exhaustive] enum keeps the door open.
  • B. Per-call is_idempotent() downgrade. Rejected because it loses the auto-retry safety net for the very workloads (counters, telemetry rollups) that benefit most, AND because it would only gate two of the four+ retry pathways — try_handle_server_error (5xx) and try_handle_write_forbidden (403/3) retry cross-region without consulting is_idempotent(). Defeating those is a cross-cutting retry-policy redesign, not a one-line change. Also requires changing OperationType from Copy to per-instance state.
  • C. unsafe_allow_non_idempotent_patch opt-in flag. Rejected: an unsafe_* flag is a smell that the underlying primitive isn't designed yet. No motivating use case (OpenAI's primary PATCH use is set for soft-delete, which phase 1 covers).
  • D. Restore the API exactly as it was before Remove Cosmos PATCH API temporarily #3765. Rejected because this is precisely what reviewers in Remove Cosmos PATCH API temporarily #3765 said was unacceptable — it would silently double-apply increments under retry.

Problem Space Analysis

Landscape confidence: HIGH (full report in .coding-harness/landscape-report.md — note: this file is gitignored as a session artifact; full text is included as a collapsed section below)

Blast Radius

The full call graph and cross-cutting concern sweep is in §1 and §3 of the landscape report. Highlights:

Confirmed unaffected:

  • Auth / token refresh: Method::Patch is already mapped in both authorization_policy.rs:131 and SDK signature_target.rs:57.
  • Async execution model: PATCH is a single point op — same shape as replace_item.
  • Connection pooling / session-token propagation: reuses the existing item-write path.
  • Configuration / client construction: no new options.

Interacts with this change:

  • Retry policy (retry_evaluation.rsis_idempotent() gate at 406–408 / 523–525, try_handle_server_error at 457, try_handle_write_forbidden at 289) — phase-1 wire body is document-level idempotent so all four pathways are safe regardless of which one fires. (SE-001)
  • PPAF / PPCB (ppaf_write_retry_allowed) — designed for non-idempotent writes; same wire-body argument applies. (SE-001)
  • OperationType enums — driver-canonical enum needs the Patch variant; legacy SDK enum already has it. (SE-002)
  • Fault injection — driver-side mapping is a known TODO. (SE-003)
  • Wire format — Cosmos uses incr / separate set/add (NOT RFC 6902); wire field is condition, not filterPredicate. (SE-004, SE-005)
  • Transactional batch — other SDKs include patch-in-batch; Rust does not yet. Out of scope here. (SE-006)

Decision Log

Decision Alternatives Why This Choice
Phase 1 disallows add / increment / move at the SDK boundary (A) RMW loop now; (B) per-call is_idempotent() downgrade; (C) unsafe_* flag; (D) restore unchanged Smallest change that ships customer value (OpenAI's set-for-soft-delete is the largest known use case), keeps is_idempotent() == true sound (the type system enforces it), and preserves the phase-2 path via #[non_exhaustive].
OperationType::Patch added to the driver enum (canonical) Add only to SDK-side legacy enum; reconcile both now Driver enum is the canonical one going forward. Note: this is a new decision — PR #4128 added zero new variants to either OperationType, so the precedent is "leave the legacy enum alone," not "only add to driver." Reconciling both enums is a separate sweep tracked as a follow-up.
Public Rust name for wire-condition is filter_predicate Use condition directly Matches .NET's customer-facing vocabulary so cross-SDK docs are coherent. Wire-level name remains condition.

Full decision reasoning is in §4 and §7 of the spec document.


⚠️ Potential Side Effects & Risk Surface

[SE-001] 🟡 Idempotency × auto-retry-all-writes interaction

Affected component: azure_data_cosmos_driver::driver::pipeline::retry_evaluation + OperationType::is_idempotent
Conditions: A naïve PATCH design that allowed add / increment / move and returned true from is_idempotent() would silently double-apply on retry. There are at least four retry pathways in retry_evaluation.rs (the standard is_idempotent() gate at 406–408 / 523–525, PPAF non-idempotent retry, try_handle_server_error 5xx at 457, try_handle_write_forbidden 403/3 at 289) — three of them retry without consulting is_idempotent(). The wire body itself must be safe to replay.
Assessment: Potential, not breaking — phase 1 ships an additive API and we choose the design to avoid the failure mode.
Mitigation: Public PatchOperation enum exposes only Set / Replace / Remove (no Deserialize impl). Every constructible PatchDocument produces a wire body that is unconditionally idempotent at the document level — the property required by every retry pathway. Maintainer guardrail: an exhaustive match test in the defining crate fails to compile when a new PatchOperation variant is added (#[non_exhaustive] does not suppress exhaustiveness checks inside the defining crate), forcing a phase-2 redesign before any non-idempotent op can ship. Phase-2 RMW for non-idempotent ops is explicit follow-up work.

[SE-002] 🟡 Two OperationType enums diverge on Patch support

Affected component: azure_data_cosmos::operation_context::OperationType (legacy) vs azure_data_cosmos_driver::models::OperationType (canonical)
Conditions: SDK-side enum has the Patch variant; driver-side does not. Implementer reading only one enum may miss the other.
Assessment: Potential — consistent with the in-progress driver cutover but a known correctness hazard.
Mitigation: Spec adds the variant to the driver enum and leaves the legacy enum's existing variant unchanged. Adding a variant to the driver enum is a new decision (PR #4128 added zero new variants); the cited precedent only covers the "leave the legacy enum alone" half. Full reconciliation is tracked as a separate follow-up.

[SE-003] 🟢 Driver fault-injection PatchItem mapping is a known TODO

Affected component: azure_data_cosmos_driver::fault_injection
Conditions: Customer using FaultOperationType::PatchItem expects fault rules to match patch_item calls.
Assessment: Minor — the TODO at fault_injection/mod.rs:191 was placed for exactly this work item.
Mitigation: Resolved in the same change (M8 / AC9). TODO comment removed in the same commit.

[SE-004] 🟡 Wire op tokens are NOT RFC 6902

Affected component: PatchOperation serde representation
Conditions: Implementer copying RFC 6902 conventions removes the #[serde(rename)] attributes and emits the wrong wire tokens.
Assessment: Potential — caught only by integration tests against the service.
Mitigation: Spec §4.1 references the prior implementation (preserved at .coding-harness/removed_patch_operations.rs.diff) as the source of truth. A unit test asserts byte-exact wire body including op tokens.

[SE-005] 🟢 Wire field is condition, not filterPredicate

Affected component: PatchDocument serde representation
Conditions: Implementer takes .NET's FilterPredicate public name as the wire name.
Assessment: Minor.
Mitigation: Public Rust name is filter_predicate; #[serde(rename = "condition")] annotates the wire form. Unit test asserts the wire field name explicitly.

[SE-006] 🟢 Transactional batch PatchItem is unsupported

Affected component: azure_data_cosmos::transactional_batch
Conditions: Customer expects feature parity with .NET's TransactionalBatchPatchItemRequestOptions.
Assessment: Minor — explicit feature gap.
Risk accepted: Out of scope for this work item; a follow-up issue will be filed.


What Reviewers Should Focus On

  1. The phase-1 disallow trade-off (§7 and §8 of the spec). Is rejecting add/increment/move at the SDK boundary acceptable, or should phase 1 ship the RMW loop instead? Ashley's task-description guidance and OpenAI's primary set-for-soft-delete use case point to disallow; @analogrelay should confirm.
  2. The soundness argument for is_idempotent() == true (spec §4.4). The argument is "the type system makes the wire request unconditionally idempotent at the document level; therefore every retry pathway in retry_evaluation.rs is safe — including the three (PPAF, try_handle_server_error 5xx, try_handle_write_forbidden 403/3) that retry without consulting is_idempotent() at all." @FabianMeiswinkel should confirm there are no phase-1 retry pathways where document-level wire-body idempotency is insufficient.
  3. The two-OperationType-enums posture (SE-002). We are deliberately leaving them divergent in this PR and tracking reconciliation separately. Confirm this matches the rolling-cutover plan rather than fighting it.

Lower-risk areas (fully traced, safe to skim):

  • Auth / signing: Method::Patch already wired correctly.
  • Wire-format details: explicit unit-test coverage; prior removed code preserved as reference.
  • Public ContainerClient::patch_item shape: identical to replace_item.

Testing

  • N/A — spec-only PR, no compilable changes.
  • Test plan defined in spec §6 (unit + driver-unit + integration + doc tests). Tests will be added in the implementation PR.

Integration tests not included: spec PR; will be added in the implementation PR per the work-item directive that this PR be spec-only.


Risk Assessment

  • Breaking change: No. Spec is documentation-only.
  • Performance impact: None (no executable code).
  • Behavior change for existing users: None — PATCH is currently absent from the public API; spec describes its restoration but does not yet implement it.
  • Compatibility impact: None for this PR. The implementation PR will be additive at the public API surface.

Cross-SDK Impact

SDK Status
.NET Not affected — used as the cross-SDK reference for vocabulary (FilterPredicate) and wire tokens
Java Not affected — should be cross-checked during implementation phase per work-item note about Java alignment
Python Not affected
Go Not affected
JavaScript Not affected
Rust Affected — this PR's spec; implementation in a follow-up

Landscape Report

The full landscape report (landscape-report.md) and machine-readable side-effect catalog (landscape.json) are session artifacts under .coding-harness/ (gitignored). Key findings inlined in the Problem Space Analysis and Side Effects sections above. Happy to paste the full report into a PR comment if reviewers prefer.

This commit adds a design spec for restoring the Cosmos PATCH API removed in
PR Azure#3765. The spec proposes a phase-1 design that exposes only the
unconditionally-idempotent subset of operations (set / replace / remove) and
explicitly disallows add / increment / move at the SDK boundary so that the
wire request is always safe under the auto-retry-all-writes default. A phase-2
follow-up will revisit non-idempotent operations via Read-Modify-Write.

The spec is informed by a full Landscape Analysis covering call graph, retry
semantics (including PPAF/PPCB), driver/SDK split state, and cross-SDK
behavioral parity with .NET. Six side effects were classified; the central
one (idempotency interaction with auto-retry) is mitigated by narrowing the
public PatchOperation enum to the idempotent subset.

This is a spec-only change. No source code changes are made.

Refs: Azure#4048
Refs: Azure#3765 (the removal we are reversing)
Refs: Azure#4327 (Feed Operations Spec — Planner alignment)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos The azure_cosmos crate label May 4, 2026
@NaluTripician NaluTripician marked this pull request as ready for review May 4, 2026 21:12
@NaluTripician NaluTripician requested a review from a team as a code owner May 4, 2026 21:13
Copilot AI review requested due to automatic review settings May 4, 2026 21:13
…st, AC/test gaps)

Addresses PR Deep Reviewer iteration 1 findings:
- 1/2/3: Rewrite section 4.4 idempotency soundness. The driver retry
  surface has at least four pathways in retry_evaluation.rs and only
  two consult is_idempotent(); try_handle_server_error (5xx, line 457)
  and try_handle_write_forbidden (403/3, line 289) retry cross-region
  unconditionally, and PPAF was designed for non-idempotent retries.
  Soundness now rests on document-level wire-body idempotency, not
  on a // SAFETY: comment over a single retry gate.
- 4/5: Section 4.2 PatchItemOptions concrete shape and ItemWriteOptions
  applicability table.
- 6: Remove-op unit test now requires byte-exact assertion.
- 7: Section 4.5 corrects PR Azure#4128 precedent characterization (added
  zero new variants; precedent is leave-legacy-alone, not only-add-to-driver).
- 8: Adds 4 acceptance criteria + integration tests (etag x predicate
  conjunction, soft-delete-of-soft-deleted, partition split, PPCB).
- 9: Section 11.2 open question rewritten for FabianMeiswinkel.
- Polish 10/11/12: rename_all serde note, README example with
  filter_predicate + content_response_on_write=false, phase-2 constraints
  subsection in Out of Scope.
- Section 7.2 Alternative B rejection extended to mention the 5xx and
  403/3 retry paths that bypass is_idempotent().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Cosmos PATCH design spec for the Rust SDK and documents a proposed phase-1 restore of PATCH using a restricted operation set intended to fit the current driver/retry architecture. It also adds a repo-level ignore for local spec-generation artifacts referenced during the design process.

Changes:

  • Add PATCH_OPERATIONS_SPEC.md describing the proposed PATCH API surface, retry/idempotency model, driver integration, risks, and test plan.
  • Outline the future implementation work across SDK, driver, tests, README, and changelog files.
  • Ignore local .coding-harness/ artifacts in the repository root .gitignore.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.

File Description
sdk/cosmos/azure_data_cosmos_driver/docs/PATCH_OPERATIONS_SPEC.md New PATCH restoration spec covering requirements, API shape, driver wiring, tests, and rollout plan.
.gitignore Ignores local .coding-harness/ spec-generation artifacts.

Comment on lines +60 to +62
- Operation `op` tokens are: `set`, `replace`, `remove` (lowercase).
- The serde representation is preserved from the implementation removed in
PR #3765, recovered at `.coding-harness/removed_patch_operations.rs.diff`.
Comment on lines +79 to +80
- **M9.** Per-call options support `if_match_etag` (existing
`ItemWriteOptions` pattern) so customers can build classic
### 2.2 Should

- **S1.** The public Rust name for the wire-level `condition` field is
`filter_predicate` on the options struct (matching .NET's customer-facing
Comment on lines +201 to +205
pub fn with_remove<P>(self, path: P) -> Self
where P: Into<Cow<'static, str>> { ... }
pub fn with_filter_predicate<C>(self, condition: C) -> Self
where C: Into<Cow<'static, str>> { ... }
}
Comment on lines +220 to +223
conversion on the user's value can fail without panicking. (This matches the
prior API's design intent.)
- The trait `ToJsonNumber` from the prior code is **not** restored — it was
only used by `with_increment`, which is deferred.
Comment on lines +120 to +122

A reviewer can confirm "done" by checking:

listed in M6.
2. Add a `CosmosOperation::patch_item(reference, patch_document) -> CosmosOperation`
factory in `azure_data_cosmos_driver::models::cosmos_operation` following the
existing `replace_item` factory pattern (line ~441).
Addresses PR Deep Reviewer iteration 2 findings:
- N1: AC12 / section 6.3 conjunction test now enumerates all four
  etag x predicate combinations including (no-match, no-match), and
  notes the server-side eval-order tie-break is observable but not pinned.
- N2 + N3: Remove-op byte-exact assertion now includes the
  {"operations":[...]} wrapping array, which also pins the
  "operations" top-level wire field name. M5 invariants list extended
  to call out "operations" as plural lowercase.
- N4: AC13 / soft-delete-idempotent test reworded to be honest
  about what it tests (application-level repeat safety, not
  transparent retry). Real retry-pathway property tracked as a
  fault-injection follow-up under M8.
- N5: Section 10 phase-2 constraint 2 "major-version bump" replaced
  with concrete language about driver-boundary observable retry
  behavior; driver crate has no formal semver contract.
- N6: PatchItemOptions::item_options changed from pub to pub(crate)
  with rationale comment so phase 2 can add patch-specific invariants
  without a breaking change. Mandates pass-through builders for every
  applicability-table field.
- N7: Header-block reference to .coding-harness/landscape-report.md
  notes the PR description (Azure#4338) is the durable copy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captures discussion raised by @FabianMeiswinkel on the time-savings
of phase-1 op-narrowing vs. starting with RMW, and the alternative
of requiring If-Match etag on every patch instead.

- Section 7.5 (Alternative E): mandatory-etag for phase-1 — explicitly
  rejected for phase 1 because it regresses the canonical
  `set /deleted = true` soft-delete workload from one RU to two
  (extra read for the etag), and because 412 on transparent retry
  is ambiguous between duplicate-success and real conflict — the
  driver cannot disambiguate without a re-read, which is itself
  the RMW alternative we rejected.
- Section 10 phase-2 options: add option (d) mandatory-etag-for-
  non-idempotent-ops. Trade-offs differ from phase-1: most
  counter/move workloads already have an etag from a prior read,
  so the extra-read cost vanishes, and the caller is in a better
  position to disambiguate 412 than the SDK.
- Section 11 question 5: explicit open question to FabianMeiswinkel
  on RMW vs. mandatory-etag preference for phase 2.

The key insight: the right idempotency answer differs across the
op set, which itself justifies phase-1 narrowing rather than a
one-size-fits-all retry policy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NaluTripician
Copy link
Copy Markdown
Author

@FabianMeiswinkel — picking up your question on whether phase-1 actually saves time vs. starting with RMW, and your suggestion of disallowing conditions initially. Short version: phase-1 saves the RMW loop subsystem specifically, not the whole spec; and disallowing the condition field doesn't help idempotency, but mandatory If-Match etag does and is worth a closer look — though not for phase 1 IMO. I just pushed c8f8647eb to capture this in §7.5, §10, and §11.5 of the spec so it lands in the durable design record.

What phase-1 actually defers vs. what it doesn't

The bulk of the work — PatchOperation / PatchDocument types, OperationType::Patch wiring on the driver enum, patch_item driver method, fault-injection mapping, the wire-body invariants (lowercase op tokens, condition rename), the public-API tests — has to ship regardless. Restoring PATCH at all without those is not viable. So if you compare "phase-1 with op-narrowing" to "phase-1 with RMW", what actually moves between them is just the RMW loop:

  • read-with-retry-budget,
  • modify (apply ops client-side),
  • write with If-Match + retry-on-412 with re-read,
  • persistent-conflict surfacing semantics,
  • the RU-charge story for 2+ requests under one logical "PatchItem",
  • and the round-trip observability (metrics / diagnostics need to know it's now a compound op).

That's the work phase-1 saves. It's real, but it's also the easiest part to design wrong without a spec, which is why I'd argue it's exactly the right thing to defer.

"Disallow conditions initially" — two readings

Reading 1 — disallow filter_predicate (the wire condition field). This is orthogonal to idempotency. Filter-predicate is a server-side guard on the target document's state ("only patch if /status == "active""); it doesn't change the retry-safety of the operations in the patch. A patch with incr /count 1 is non-idempotent whether or not it has a predicate, and a patch with only set is idempotent whether or not it has one. Removing predicates buys no retry safety and removes a useful hook for soft-delete patterns ("set /deleted = true if it isn't already"). I don't think this is the reading you intended, but flagging in case.

Reading 2 — require If-Match etag on every patch. This is substantive and I think this is what you meant. The argument: the server returns 412 on a duplicate apply because the etag has advanced, giving us at-most-once semantics without an in-driver RMW loop.

This is captured in spec §7.5 as Alternative E. For phase 1 I rejected it for two reasons:

  1. Regresses the headline workload. OpenAI's largest stated PATCH use case is set /deleted = true on a known doc id — a one-RU operation today. Mandatory-etag forces a read_item first to obtain an etag, so it becomes two requests / two RU charges / two round trips for the very workload phase-1 most wants to enable.

  2. 412-on-retry is ambiguous under our retry-all-writes policy. If is_idempotent() == true and the driver retries transparently, and the second attempt returns 412, the driver cannot tell whether (a) the first attempt succeeded and the etag is stale (a "duplicate-success" retry — should report success), or (b) a concurrent writer changed the document (a real conflict — should report 412). Disambiguation requires a re-read, which is exactly the RMW loop. Surfacing the raw 412 to callers regardless makes us substantially less ergonomic than .NET / Java, both of which allow unconditional patches.

For phase 2 the trade-offs invert and I think mandatory-etag becomes genuinely attractive — added as option (d) in §10 alongside the existing RMW / PatchRmw variant / per-instance is_idempotent options. Reasoning: most realistic counter/move workloads ("I just read this counter and want to bump it") already have an etag in hand from a prior read, so the extra-read cost vanishes; and a caller running incr is already in a better position than the SDK to decide what to do with a 412 (re-read and retry their own logic vs. give up). The driver can offer a PatchOperation::Increment variant whose constructor requires an etag value — type-system-enforced — and incr patches just don't compose with cancel-this-write retry pathways at all. That's cleaner than RMW IMO, at the cost of a stricter public API.

Bottom line

Time savings of phase-1 over starting-with-RMW: real, on the order of "the entire RMW loop subsystem and its retry/observability story", but not "the whole spec".

Mandatory-etag-as-phase-1-alternative-to-narrowing: rejected, see §7.5.

Mandatory-etag-as-phase-2-alternative-to-RMW: live and now spec'd as option (d) in §10. Open question §11.5 is explicitly directed to you on whether you'd prefer it over RMW for phase 2.

Happy to switch §10's framing to recommend (d) if that matches your gut, or to keep the spec position-neutral and decide when phase-2 is scoped. Either is fine by me.

@github-project-automation github-project-automation Bot moved this from Todo to Done in CosmosDB Go/Rust Crew May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants