You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(sqs): address two open Codex P1 mediums (control-plane dedup gate, PR 2 dormancy gate)
Two Codex P1 findings from the tenth-round Codex review on commit
14b4d88 that were carried forward to today's eleventh-round
Claude review:
3.D split-queue FIFO §3.2 / §4.1 -- {PartitionCount > 1,
DeduplicationScope = "queue"} validation moved to control plane
(Codex P1, medium): the old §4.1 paragraph rejected this
combination at SendMessage time, so an operator who mis-configured
at CreateQueue would get a successful response and only discover
the problem when every send failed -- a created-but-unserviceable
queue with no recovery short of DeleteQueue+CreateQueue. Added a
new "Cross-attribute validation at CreateQueue and
SetQueueAttributes" paragraph to §3.2 with the rejection rule
(InvalidParameterValue plus the AWS-shaped reason "queue-scoped
deduplication is incompatible with multi-partition FIFO because
the dedup key cannot be globally unique across partitions without
a cross-partition OCC transaction"). Reframed the §4.1 paragraph
as a "cannot reach this code" pointer to the §3.2 gate so the
runtime rejection is impossible.
3.D split-queue FIFO §11 -- explicit dormancy gate for PR 2-4
(Codex P1, medium): "feature is dormant" was editorial intent, not
a runtime guarantee. A cluster on PR 2-4 would accept
CreateQueue(PartitionCount=4), dispatch every SendMessage against
the legacy single-partition keyspace (partitionIndex=0 because the
data plane has not been wired yet), then have those messages be
invisible to the partition-aware fanout reader and reaper that
land in PR 5. PR 2 row now adds a temporary CreateQueue rejection
("PartitionCount > 1 requires HT-FIFO data plane -- not yet
enabled"); PR 3-4 rows note the gate is still in place; PR 5 row
explicitly removes it in the same commit that wires the fanout so
the gate-and-lift land atomically. New paragraph below the table
documents why the gate exists and why atomic gate-and-lift makes
the wrong-layout-data class of bug impossible.
Copy file name to clipboardExpand all lines: docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md
+9-5Lines changed: 9 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,6 +112,8 @@ type sqsQueueMeta struct {
112
112
113
113
The pattern is the same: each attribute participates in a routing or dedup decision whose correctness depends on every existing message having been written under a single, consistent value. Live mutation creates a "before" set and an "after" set with incompatible invariants that the runtime cannot reconcile without a full drain.
114
114
115
+
**Cross-attribute validation at `CreateQueue` and `SetQueueAttributes`** (Codex P1 on PR #664 tenth-round Codex review): the same validator that enforces immutability also rejects incoherent attribute combinations *before* the queue is created. Specifically, `{PartitionCount > 1, DeduplicationScope = "queue"}` is rejected with `InvalidParameterValue` ("queue-scoped deduplication is incompatible with multi-partition FIFO because the dedup key cannot be globally unique across partitions without a cross-partition OCC transaction"). Without this control-plane gate, an operator who mis-configures the combination at `CreateQueue` gets a successful response and only discovers the problem when every subsequent `SendMessage` fails — a created-but-unserviceable queue with no recovery short of `DeleteQueue`+`CreateQueue`. Any other invalid combinations the implementation discovers during PR 2 should land in the same validator (and the §4.x rejection paragraphs that mention them should be reframed as "cannot reach this code" notes, not runtime rejections).
116
+
115
117
**Enforcement (gate is `CreateQueue`, not first send)**: when `SetQueueAttributes` is called, the validator loads the current `sqsQueueMeta` (one Raft-consistent point read against the catalog — already required for the OCC compare-and-set) and rejects with `InvalidAttributeValue` if any of the three attributes in the request differs from the value already on the meta. **`SetQueueAttributes` is all-or-nothing**: if any immutable attribute in the request carries a differing value, the entire request is rejected before any attribute is persisted — including the *mutable* attributes in the same call (e.g. `VisibilityTimeout` paired with an attempted `PartitionCount` change is rejected as a whole; the `VisibilityTimeout` change does not commit on its own). The §9 immutability test pins this rule. No range scan over the message keyspace is required; no `firstSendAt` timestamp needs to be added; no concurrent-send race exists, because the meta value is set once at `CreateQueue` commit and never changes thereafter. This matches AWS's published behaviour ("you can't change the queue type after you create it" extends to `FifoThroughputLimit` and `DeduplicationScope` in HT-FIFO queues), so SDK clients see the same rejection envelope on the same request as on AWS proper. Picking the create-time gate over a first-send gate is also defensible from a correctness lens: the corner case "operator creates a queue with `PartitionCount=8` and changes their mind to `PartitionCount=4` before any producer connects" is rare and can be solved by `DeleteQueue`+`CreateQueue` (which the operator can also do post-first-send for any other reason). The simplicity of a stateless validator is worth more than the vanishingly small set of operators who would benefit from a brief mutability window.
116
118
117
119
### 3.3 Routing
@@ -179,7 +181,7 @@ For deployments that don't want one Raft group per partition (e.g. a small clust
179
181
leader-proxy path, unchanged).
180
182
```
181
183
182
-
Steps 1–2 are unchanged; step 3 is the new routing call (~10 lines); steps 4–6 are the existing send path with `partitionIndex` threaded through the key constructors. The dedup record written by step 6 keys on `(queue, partition, MessageGroupId, dedupID)` — when `DeduplicationScope = messageGroup`, this is correct by construction; when `DeduplicationScope = queue`, the validator rejects the request unless `PartitionCount = 1`.
184
+
Steps 1–2 are unchanged; step 3 is the new routing call (~10 lines); steps 4–6 are the existing send path with `partitionIndex` threaded through the key constructors. The dedup record written by step 6 keys on `(queue, partition, MessageGroupId, dedupID)` — when `DeduplicationScope = messageGroup`, this is correct by construction; the `{DeduplicationScope = queue, PartitionCount > 1}` combination cannot reach this code path because the §3.2 control-plane validator rejects it at `CreateQueue` / `SetQueueAttributes` time (see "Cross-attribute validation" in §3.2 below). Rejecting here at `SendMessage` time would mean an operator could create a queue and only discover the misconfiguration on the first send — a created-but-unserviceable state with no recovery short of `DeleteQueue`+`CreateQueue`. The control-plane gate makes the misconfiguration impossible.
183
185
184
186
### 4.2 ReceiveMessage on a partitioned FIFO
185
187
@@ -366,14 +368,16 @@ This is out of scope here.
366
368
| PR | Content | Reviewable in isolation? |
367
369
|---|---|---|
368
370
| 1 | This proposal doc lands. Operators have time to flag concerns. | Yes |
| 2 | Schema: `sqsQueueMeta.PartitionCount`, `DeduplicationScope`, `FifoThroughputLimit`. Routing function `partitionFor`. CreateQueue / SetQueueAttributes validation including the §3.2 cross-attribute rules. **Temporary feature gate**(see below): `CreateQueue` rejects `PartitionCount > 1` with `InvalidAttributeValue` ("PartitionCount > 1 requires HT-FIFO data plane — not yet enabled") so the schema field exists in the meta type but cannot land in production data. | Yes (catalog only) |
372
+
| 3 | Keyspace: thread `partitionIndex` through every `sqsMsg*Key` constructor, defaulting to 0 so existing queues stay byte-identical. Gate from PR 2 still in place — `PartitionCount > 1` remains rejected. | Yes (mechanical) |
373
+
| 4 | Routing layer: `kv/shard_router.go` accepts the `(queue, partition)` key. New `--sqsFifoPartitionMap` flag (separate from the existing `--raftSqsMap` endpoint-mapping flag). Mixed-version gate. PR 2's temporary `PartitionCount > 1` rejection still in place. | Yes (operator-config) |
374
+
| 5 | Send / Receive partition fanout. Receipt-handle v2 codec. **Removes the PR 2 `PartitionCount > 1` rejection** in the same commit that wires the data-plane fanout — the gate and its lift land atomically so a half-deployed cluster can never accept a partitioned queue without the data plane to serve it. | Yes (data-plane) |
| 8 | Partial-doc lifecycle bump: 3.D moves from TODO to Landed. Section 13 from §16.6 of the partial doc gets the as-built record. | Yes (docs) |
376
378
379
+
**Why the temporary gate** (Codex P1 on PR #664 tenth-round Codex review): without it, a cluster running PR 2–4 would accept a `CreateQueue` with `PartitionCount = 4` (the schema is in place, the validator only checks per-attribute validity) and then dispatch every subsequent `SendMessage` against the **legacy single-partition keyspace** with `partitionIndex = 0` — silently writing all messages under `!sqs|msg|data|<queue>|…` regardless of `PartitionCount`. When PR 5 lands and the new fanout reader looks for messages under the partitioned prefix `!sqs|msg|data|p|<queue>|<partition>|…`, every message written during the PR 2–4 window is invisible to it and to the partition-aware reaper scan. The gate-and-lift pattern (PR 2 rejects, PR 5 lifts in the same commit as the data-plane fanout) makes it impossible to land data under the wrong layout: any cluster that accepts `PartitionCount > 1` is, by construction, also running the partition-aware send path.
380
+
377
381
**Gate of no return**: PR 5 is the point where a partitioned FIFO queue can hold real data. Once any production cluster runs PR 5 and creates a partitioned queue, rolling back means draining and recreating the queue. PR 1–4 are reversible (no data layout change). Recorded in the PR descriptions.
0 commit comments