Skip to content

Commit fc7c3cb

Browse files
committed
docs(sqs): address two open Codex P1 mediums (control-plane dedup gate, PR 2 dormancy gate)
Two Codex P1 findings from the tenth-round Codex review on commit 14b4d88 that were carried forward to today's eleventh-round Claude review: 3.D split-queue FIFO §3.2 / §4.1 -- {PartitionCount > 1, DeduplicationScope = "queue"} validation moved to control plane (Codex P1, medium): the old §4.1 paragraph rejected this combination at SendMessage time, so an operator who mis-configured at CreateQueue would get a successful response and only discover the problem when every send failed -- a created-but-unserviceable queue with no recovery short of DeleteQueue+CreateQueue. Added a new "Cross-attribute validation at CreateQueue and SetQueueAttributes" paragraph to §3.2 with the rejection rule (InvalidParameterValue plus the AWS-shaped reason "queue-scoped deduplication is incompatible with multi-partition FIFO because the dedup key cannot be globally unique across partitions without a cross-partition OCC transaction"). Reframed the §4.1 paragraph as a "cannot reach this code" pointer to the §3.2 gate so the runtime rejection is impossible. 3.D split-queue FIFO §11 -- explicit dormancy gate for PR 2-4 (Codex P1, medium): "feature is dormant" was editorial intent, not a runtime guarantee. A cluster on PR 2-4 would accept CreateQueue(PartitionCount=4), dispatch every SendMessage against the legacy single-partition keyspace (partitionIndex=0 because the data plane has not been wired yet), then have those messages be invisible to the partition-aware fanout reader and reaper that land in PR 5. PR 2 row now adds a temporary CreateQueue rejection ("PartitionCount > 1 requires HT-FIFO data plane -- not yet enabled"); PR 3-4 rows note the gate is still in place; PR 5 row explicitly removes it in the same commit that wires the fanout so the gate-and-lift land atomically. New paragraph below the table documents why the gate exists and why atomic gate-and-lift makes the wrong-layout-data class of bug impossible.
1 parent 829cf97 commit fc7c3cb

1 file changed

Lines changed: 9 additions & 5 deletions

File tree

docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ type sqsQueueMeta struct {
112112

113113
The pattern is the same: each attribute participates in a routing or dedup decision whose correctness depends on every existing message having been written under a single, consistent value. Live mutation creates a "before" set and an "after" set with incompatible invariants that the runtime cannot reconcile without a full drain.
114114

115+
**Cross-attribute validation at `CreateQueue` and `SetQueueAttributes`** (Codex P1 on PR #664 tenth-round Codex review): the same validator that enforces immutability also rejects incoherent attribute combinations *before* the queue is created. Specifically, `{PartitionCount > 1, DeduplicationScope = "queue"}` is rejected with `InvalidParameterValue` ("queue-scoped deduplication is incompatible with multi-partition FIFO because the dedup key cannot be globally unique across partitions without a cross-partition OCC transaction"). Without this control-plane gate, an operator who mis-configures the combination at `CreateQueue` gets a successful response and only discovers the problem when every subsequent `SendMessage` fails — a created-but-unserviceable queue with no recovery short of `DeleteQueue`+`CreateQueue`. Any other invalid combinations the implementation discovers during PR 2 should land in the same validator (and the §4.x rejection paragraphs that mention them should be reframed as "cannot reach this code" notes, not runtime rejections).
116+
115117
**Enforcement (gate is `CreateQueue`, not first send)**: when `SetQueueAttributes` is called, the validator loads the current `sqsQueueMeta` (one Raft-consistent point read against the catalog — already required for the OCC compare-and-set) and rejects with `InvalidAttributeValue` if any of the three attributes in the request differs from the value already on the meta. **`SetQueueAttributes` is all-or-nothing**: if any immutable attribute in the request carries a differing value, the entire request is rejected before any attribute is persisted — including the *mutable* attributes in the same call (e.g. `VisibilityTimeout` paired with an attempted `PartitionCount` change is rejected as a whole; the `VisibilityTimeout` change does not commit on its own). The §9 immutability test pins this rule. No range scan over the message keyspace is required; no `firstSendAt` timestamp needs to be added; no concurrent-send race exists, because the meta value is set once at `CreateQueue` commit and never changes thereafter. This matches AWS's published behaviour ("you can't change the queue type after you create it" extends to `FifoThroughputLimit` and `DeduplicationScope` in HT-FIFO queues), so SDK clients see the same rejection envelope on the same request as on AWS proper. Picking the create-time gate over a first-send gate is also defensible from a correctness lens: the corner case "operator creates a queue with `PartitionCount=8` and changes their mind to `PartitionCount=4` before any producer connects" is rare and can be solved by `DeleteQueue`+`CreateQueue` (which the operator can also do post-first-send for any other reason). The simplicity of a stateless validator is worth more than the vanishingly small set of operators who would benefit from a brief mutability window.
116118

117119
### 3.3 Routing
@@ -179,7 +181,7 @@ For deployments that don't want one Raft group per partition (e.g. a small clust
179181
leader-proxy path, unchanged).
180182
```
181183

182-
Steps 1–2 are unchanged; step 3 is the new routing call (~10 lines); steps 4–6 are the existing send path with `partitionIndex` threaded through the key constructors. The dedup record written by step 6 keys on `(queue, partition, MessageGroupId, dedupID)` — when `DeduplicationScope = messageGroup`, this is correct by construction; when `DeduplicationScope = queue`, the validator rejects the request unless `PartitionCount = 1`.
184+
Steps 1–2 are unchanged; step 3 is the new routing call (~10 lines); steps 4–6 are the existing send path with `partitionIndex` threaded through the key constructors. The dedup record written by step 6 keys on `(queue, partition, MessageGroupId, dedupID)` — when `DeduplicationScope = messageGroup`, this is correct by construction; the `{DeduplicationScope = queue, PartitionCount > 1}` combination cannot reach this code path because the §3.2 control-plane validator rejects it at `CreateQueue` / `SetQueueAttributes` time (see "Cross-attribute validation" in §3.2 below). Rejecting here at `SendMessage` time would mean an operator could create a queue and only discover the misconfiguration on the first send — a created-but-unserviceable state with no recovery short of `DeleteQueue`+`CreateQueue`. The control-plane gate makes the misconfiguration impossible.
183185

184186
### 4.2 ReceiveMessage on a partitioned FIFO
185187

@@ -366,14 +368,16 @@ This is out of scope here.
366368
| PR | Content | Reviewable in isolation? |
367369
|---|---|---|
368370
| 1 | This proposal doc lands. Operators have time to flag concerns. | Yes |
369-
| 2 | Schema: `sqsQueueMeta.PartitionCount`, `DeduplicationScope`, `FifoThroughputLimit`. Routing function `partitionFor`. CreateQueue / SetQueueAttributes validation. **No** keyspace changes yet — feature is dormant. | Yes (catalog only) |
370-
| 3 | Keyspace: thread `partitionIndex` through every `sqsMsg*Key` constructor, defaulting to 0 so existing queues stay byte-identical. | Yes (mechanical) |
371-
| 4 | Routing layer: `kv/shard_router.go` accepts the `(queue, partition)` key. New `--sqsFifoPartitionMap` flag (separate from the existing `--raftSqsMap` endpoint-mapping flag). Mixed-version gate. | Yes (operator-config) |
372-
| 5 | Send / Receive partition fanout. Receipt-handle v2 codec. | Yes (data-plane) |
371+
| 2 | Schema: `sqsQueueMeta.PartitionCount`, `DeduplicationScope`, `FifoThroughputLimit`. Routing function `partitionFor`. CreateQueue / SetQueueAttributes validation including the §3.2 cross-attribute rules. **Temporary feature gate** (see below): `CreateQueue` rejects `PartitionCount > 1` with `InvalidAttributeValue` ("PartitionCount > 1 requires HT-FIFO data plane — not yet enabled") so the schema field exists in the meta type but cannot land in production data. | Yes (catalog only) |
372+
| 3 | Keyspace: thread `partitionIndex` through every `sqsMsg*Key` constructor, defaulting to 0 so existing queues stay byte-identical. Gate from PR 2 still in place — `PartitionCount > 1` remains rejected. | Yes (mechanical) |
373+
| 4 | Routing layer: `kv/shard_router.go` accepts the `(queue, partition)` key. New `--sqsFifoPartitionMap` flag (separate from the existing `--raftSqsMap` endpoint-mapping flag). Mixed-version gate. PR 2's temporary `PartitionCount > 1` rejection still in place. | Yes (operator-config) |
374+
| 5 | Send / Receive partition fanout. Receipt-handle v2 codec. **Removes the PR 2 `PartitionCount > 1` rejection** in the same commit that wires the data-plane fanout — the gate and its lift land atomically so a half-deployed cluster can never accept a partitioned queue without the data plane to serve it. | Yes (data-plane) |
373375
| 6 | PurgeQueue / DeleteQueue partition iteration. Tombstone schema update. Reaper update. | Yes (control-plane) |
374376
| 7 | Jepsen HT-FIFO workload. Metrics. | Yes (testing) |
375377
| 8 | Partial-doc lifecycle bump: 3.D moves from TODO to Landed. Section 13 from §16.6 of the partial doc gets the as-built record. | Yes (docs) |
376378

379+
**Why the temporary gate** (Codex P1 on PR #664 tenth-round Codex review): without it, a cluster running PR 2–4 would accept a `CreateQueue` with `PartitionCount = 4` (the schema is in place, the validator only checks per-attribute validity) and then dispatch every subsequent `SendMessage` against the **legacy single-partition keyspace** with `partitionIndex = 0` — silently writing all messages under `!sqs|msg|data|<queue>|…` regardless of `PartitionCount`. When PR 5 lands and the new fanout reader looks for messages under the partitioned prefix `!sqs|msg|data|p|<queue>|<partition>|…`, every message written during the PR 2–4 window is invisible to it and to the partition-aware reaper scan. The gate-and-lift pattern (PR 2 rejects, PR 5 lifts in the same commit as the data-plane fanout) makes it impossible to land data under the wrong layout: any cluster that accepts `PartitionCount > 1` is, by construction, also running the partition-aware send path.
380+
377381
**Gate of no return**: PR 5 is the point where a partitioned FIFO queue can hold real data. Once any production cluster runs PR 5 and creates a partitioned queue, rolling back means draining and recreating the queue. PR 1–4 are reversible (no data layout change). Recorded in the PR descriptions.
378382

379383
---

0 commit comments

Comments
 (0)