Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions adapter/sqs_catalog.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ type sqsQueueMeta struct {
// along with the rest of the queue.
Throttle *sqsQueueThrottle `json:"throttle,omitempty"`
// PartitionCount is the number of FIFO partitions for this queue
// (Phase 3.D HT-FIFO, see docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md).
// (Phase 3.D HT-FIFO, see docs/design/2026_04_26_partial_sqs_split_queue_fifo.md).
// Zero or 1 means the legacy single-partition layout — no schema
// change. Greater than 1 enables HT-FIFO. Set at CreateQueue time
// and immutable thereafter (SetQueueAttributes rejects any change).
Expand Down Expand Up @@ -478,7 +478,7 @@ var sqsAttributeAppliers = map[string]attributeApplier{
return nil
},
// PartitionCount enables HT-FIFO when > 1 (Phase 3.D, see
// docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md). Set
// docs/design/2026_04_26_partial_sqs_split_queue_fifo.md). Set
// at CreateQueue time; SetQueueAttributes attempts to change it
// reject via the immutability check in trySetQueueAttributesOnce.
// PartitionCount > 1 is gated by validateHTFIFOCapability (the
Expand Down
2 changes: 1 addition & 1 deletion adapter/sqs_keys.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ const (
)

// HT-FIFO partitioned-keyspace discriminator. Per the §3.1 design in
// docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md, partitioned
// docs/design/2026_04_26_partial_sqs_split_queue_fifo.md, partitioned
// FIFO queues live in a separate keyspace so the legacy single-
// partition layout can stay byte-identical on disk:
//
Expand Down
2 changes: 1 addition & 1 deletion adapter/sqs_partitioning.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import (

// HT-FIFO (Phase 3.D split-queue FIFO) configuration vocabulary and
// the routing primitive partitionFor. See the design doc at
// docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md.
// docs/design/2026_04_26_partial_sqs_split_queue_fifo.md.
//
// PR 2 of the §11 rollout introduces the schema fields plus the
// validation surface — including the temporary dormancy gate that
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Split-Queue FIFO for the SQS Adapter

**Status:** Proposed
**Status:** Partial
**Author:** bootjp
**Date:** 2026-04-26

Expand Down Expand Up @@ -387,16 +387,18 @@ This is out of scope here.

## 11. Rollout Plan (Multi-PR)

| PR | Content | Reviewable in isolation? |
|---|---|---|
| 1 | This proposal doc lands. Operators have time to flag concerns. | Yes |
| 2 | Schema: `sqsQueueMeta.PartitionCount`, `DeduplicationScope`, `FifoThroughputLimit`. Routing function `partitionFor`. CreateQueue / SetQueueAttributes validation including the §3.2 cross-attribute rules. **Temporary feature gate** (see below): `CreateQueue` rejects `PartitionCount > 1` with `InvalidAttributeValue` ("PartitionCount > 1 requires HT-FIFO data plane — not yet enabled") so the schema field exists in the meta type but cannot land in production data. | Yes (catalog only) |
| 3 | Keyspace: thread `partitionIndex` through every `sqsMsg*Key` constructor, defaulting to 0 so existing queues stay byte-identical. Gate from PR 2 still in place — `PartitionCount > 1` remains rejected. | Yes (mechanical) |
| 4 | Routing layer: `kv/shard_router.go` accepts the `(queue, partition)` key. New `--sqsFifoPartitionMap` flag (separate from the existing `--raftSqsMap` endpoint-mapping flag). Mixed-version gate (§8.5 capability advertisement via `/sqs_health` + catalog polling for `CreateQueue` gating, **and** the §8 leadership-refusal hook in `kv/lease_state.go` that calls `TransferLeadership` when a non-`htfifo` binary discovers a partitioned queue in its shard on startup or leadership acquisition — both components are required before the binary is marked `htfifo`-eligible). PR 2's temporary `PartitionCount > 1` rejection still in place. | Yes (operator-config) |
| 5 | Send / Receive partition fanout. Receipt-handle v2 codec. **Removes the PR 2 `PartitionCount > 1` rejection** in the same commit that wires the data-plane fanout — the gate and its lift land atomically so a half-deployed cluster can never accept a partitioned queue without the data plane to serve it. | Yes (data-plane) |
| 6 | PurgeQueue / DeleteQueue partition iteration. Tombstone schema update. Reaper update. | Yes (control-plane) |
| 7 | Jepsen HT-FIFO workload. Metrics. | Yes (testing) |
| 8 | Partial-doc lifecycle bump: 3.D moves from TODO to Landed. Section 13 from §16.6 of the partial doc gets the as-built record. | Yes (docs) |
**Status as of 2026-05-04**: PRs 1–7 are merged on `main`. The doc is being moved from `proposed` to `partial` in PR 8 (this rename) because every milestone in the rollout plan that produces shippable code has landed. The "partial" classification rather than "implemented" leaves room for future work tracked in §10 / §12 (e.g. operator-configurable hash, online resharding, cross-partition transactional admin) — none of which are in this proposal's scope but each of which would be an extension to the same surface.

| PR | Content | Reviewable in isolation? | Status |
|---|---|---|---|
| 1 | This proposal doc lands. Operators have time to flag concerns. | Yes | ✅ Merged (#664) |
| 2 | Schema: `sqsQueueMeta.PartitionCount`, `DeduplicationScope`, `FifoThroughputLimit`. Routing function `partitionFor`. CreateQueue / SetQueueAttributes validation including the §3.2 cross-attribute rules. **Temporary feature gate** (see below): `CreateQueue` rejects `PartitionCount > 1` with `InvalidAttributeValue` ("PartitionCount > 1 requires HT-FIFO data plane — not yet enabled") so the schema field exists in the meta type but cannot land in production data. | Yes (catalog only) | ✅ Merged (#681) |
| 3 | Keyspace: thread `partitionIndex` through every `sqsMsg*Key` constructor, defaulting to 0 so existing queues stay byte-identical. Gate from PR 2 still in place — `PartitionCount > 1` remains rejected. | Yes (mechanical) | ✅ Merged (#703) |
| 4 | Routing layer: `kv/shard_router.go` accepts the `(queue, partition)` key. New `--sqsFifoPartitionMap` flag (separate from the existing `--raftSqsMap` endpoint-mapping flag). Mixed-version gate (§8.5 capability advertisement via `/sqs_health` + catalog polling for `CreateQueue` gating, **and** the §8 leadership-refusal hook in `kv/lease_state.go` that calls `TransferLeadership` when a non-`htfifo` binary discovers a partitioned queue in its shard on startup or leadership acquisition — both components are required before the binary is marked `htfifo`-eligible). PR 2's temporary `PartitionCount > 1` rejection still in place. | Yes (operator-config) | ✅ Merged across 4-A / 4-B-1 / 4-B-2 / 4-B-3a / 4-B-3b (#704, #708, #715, #721, #723) |
| 5 | Send / Receive partition fanout. Receipt-handle v2 codec. **Removes the PR 2 `PartitionCount > 1` rejection** in the same commit that wires the data-plane fanout — the gate and its lift land atomically so a half-deployed cluster can never accept a partitioned queue without the data plane to serve it. | Yes (data-plane) | ✅ Merged across 5a / 5b-1 / 5b-2 / 5b-3 (#724, #731, #732, #734) |
| 6 | PurgeQueue / DeleteQueue partition iteration. Tombstone schema update. Reaper update. | Yes (control-plane) | ✅ Merged across 6a / 6b (#735, #736) |
| 7 | Jepsen HT-FIFO workload. Metrics. | Yes (testing) | ✅ Merged across 7a / 7b (#737, #738) |
| 8 | Partial-doc lifecycle bump: rename `proposed` → `partial`, annotate §11 with shipped PR anchors, update in-tree source references that point at the proposed-stage filename. | Yes (docs) | 🟡 In flight (this PR) |

**Why the temporary gate** (Codex P1 on PR #664 tenth-round Codex review): without it, a cluster running PR 2–4 would accept a `CreateQueue` with `PartitionCount = 4` (the schema is in place, the validator only checks per-attribute validity) and then dispatch every subsequent `SendMessage` against the **legacy single-partition keyspace** with `partitionIndex = 0` — silently writing all messages under `!sqs|msg|data|<queue>|…` regardless of `PartitionCount`. When PR 5 lands and the new fanout reader looks for messages under the partitioned prefix `!sqs|msg|data|p|<queue>|<partition>|…`, every message written during the PR 2–4 window is invisible to it and to the partition-aware reaper scan. The gate-and-lift pattern (PR 2 rejects, PR 5 lifts in the same commit as the data-plane fanout) makes it impossible to land data under the wrong layout: any cluster that accepts `PartitionCount > 1` is, by construction, also running the partition-aware send path.

Expand Down
2 changes: 1 addition & 1 deletion main_sqs_leadership_refusal.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ type sqsLeadershipController interface {
// observer that refuses leadership of any Raft group hosting a
// partitioned FIFO queue when this binary does NOT advertise the
// htfifo capability. Implements §8 of
// docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md.
// docs/design/2026_04_26_partial_sqs_split_queue_fifo.md.
//
// # What it protects against
//
Expand Down
2 changes: 1 addition & 1 deletion shard_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ var (

// sqsFifoPartitionMaxPartitions caps the per-queue partition count so
// the partitionFor mask + bucket-store sizing arguments in
// docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.1 stay
// docs/design/2026_04_26_partial_sqs_split_queue_fifo.md §3.1 stay
// honest: 32 partitions × ~1k RPS per shard ≈ 30k aggregate RPS per
// queue, which matches the design's stated ceiling. Operators who
// need more should split the workload across queues rather than
Expand Down
Loading