Skip to content

feat(sqs): wire partitioned-FIFO data plane through dispatch helpers (Phase 3.D PR 5b-2)#732

Open
bootjp wants to merge 1 commit intomainfrom
feat/sqs-htfifo-dispatch-wiring
Open

feat(sqs): wire partitioned-FIFO data plane through dispatch helpers (Phase 3.D PR 5b-2)#732
bootjp wants to merge 1 commit intomainfrom
feat/sqs-htfifo-dispatch-wiring

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 2, 2026

Summary

Stage 2 of Phase 3.D PR 5b: wire SendMessage / ReceiveMessage / DeleteMessage / ChangeMessageVisibility through the per-key dispatch helpers landed in PR 5b-1 (#731). The §11 PR 2 dormancy gate still rejects PartitionCount > 1 at CreateQueue, so production queues stay on the legacy keyspace and observable behaviour is byte-identical. PR 5b-3 lifts the gate atomically with the capability check.

What's wired

Send paths

  • sendMessage, sendBatchStandardOnce: partition is a compile-time 0 (Standard queues reject PartitionCount > 1 via the cross-attribute validator). Dispatch helpers route to the legacy keyspace on the steady-state path.
  • sendFifoMessage: hashes MessageGroupId once via partitionFor and threads the result through every key construction (data, vis, byage, dedup, group-lock).

Receive fanout

  • scanAndDeliverOnce wraps the scan loop with a per-partition fanout over effectivePartitionCount(meta) iterations. The shared wall-clock + per-call Max budget caps apply across the whole call, not per-partition.
  • sqsMsgCandidate carries a new partition field stamped at scan time, so loadCandidateRecord / expireMessage / commitReceiveRotation / classifyFifoGroupLock all route to the partition the message was originally stored under.

Delete + ChangeMessageVisibility

  • handle.Partition from the v2 receipt handle drives every key construction.
  • loadMessageForDelete / loadAndVerifyMessage invoke the new validateReceiptHandleVersion once meta is in scope.

Receipt-handle dispatch (the contract shift)

  • encodeReceiptHandleDispatch(meta, partition, ...) is the single point that picks v1 vs v2.
  • decodeClientReceiptHandle is now version-agnostic; the queue-aware version check moved to validateReceiptHandleVersion. v1-on-partitioned and v2-on-non-partitioned both surface as ReceiptHandleIsInvalid, preserving the dormancy promise (no v2 wire-format probability from the public API) under the new contract. Since the §11 PR 2 dormancy gate keeps every production queue non-partitioned in this PR, every v2 handle still surfaces as ReceiptHandleIsInvalid downstream — exactly the PR 5a observable behaviour.

Reaper

The reaper iterates legacy byAge keys only — partition-iterating enumeration ships in the later partition-reaper PR. buildReapOps / reapOneRecord pass nil meta + partition 0 through dispatch helpers so the keys are byte-identical to the pre-PR-5b layout.

What does NOT change yet

  • CreateQueue still rejects PartitionCount > 1 with InvalidAttributeValue (the §11 PR 2 dormancy gate, lifted atomically in PR 5b-3).
  • Production queues never enter the partitioned dispatch branches in this PR — observable behaviour is byte-identical.
  • The reaper does not yet enumerate partitioned byAge keys (a later PR).

Test plan

10 new tests across the contract surface.

Unit tests (adapter/sqs_keys_dispatch_v2_test.go):

  • TestEncodeReceiptHandleDispatch_PicksVersionByPartitionCount — pins the version dispatch decision across nil / 0 / 1 / 4 metas.
  • TestEncodeReceiptHandleDispatch_LegacyByteIdenticalToV1 — protects the byte-identical guarantee on legacy queues.
  • TestEncodeReceiptHandleDispatch_PerQueueUsesV2perQueue + PartitionCount=4 still produces v2 handles (keyspace is partitioned even when partitionFor collapses every group to partition 0).
  • TestValidateReceiptHandleVersion_QueueAwareRules — 10 sub-cases covering the (meta.PartitionCount × handle.Version) matrix, including perQueue + PartitionCount=4.
  • TestValidateReceiptHandleVersion_NilHandle — defensive nil branch.
  • TestValidateReceiptHandleVersion_RejectsV2OnNonPartitioned — named regression for the dormancy guarantee under the new contract.
  • TestSQSMsgVisScanBoundsDispatch_LegacyMatchesLegacy — byte-identical to legacy sqsMsgVisScanBounds on legacy meta.
  • TestSQSMsgVisScanBoundsDispatch_PartitionedUsesPartitionedPrefix — different partitions yield disjoint scan ranges.
  • TestSQSMsgVisScanBoundsDispatch_PerQueueOnPartitionedKeyspace — pins the PR feat(sqs): per-key dispatch helpers for partitioned-FIFO routing (Phase 3.D PR 5b-1) #731 round 2 forward-note invariant: perQueue + PartitionCount=4 keeps the partitioned vis prefix at partition 0; collapsing to legacy would silently strand send writes.

Integration tests (adapter/sqs_partitioned_dispatch_test.go) install a partitioned meta directly on a queue created via the public API, short-circuiting the dormancy gate without disabling it for production CreateQueue:

  • TestSQSServer_PartitionedFIFO_SendReceiveDeleteRoundTrip — end-to-end smoke test: send 6 groups, receive surfaces all via fanout, every handle is v2, delete via v2 handle, queue is empty afterwards, legacy keyspace stays empty.
  • TestSQSServer_PartitionedFIFO_RejectsV1Handle — forged v1 handle on a partitioned queue surfaces as ReceiptHandleIsInvalid via DeleteMessage and ChangeMessageVisibility.
  • TestSQSServer_PartitionedFIFO_PerQueueCollapsesToPartitionZeroperQueue + PartitionCount=4 receive surfaces every message in one fanout pass; every v2 handle records Partition=0.

Updated TestDecodeClientReceiptHandle_RejectsV2TestDecodeClientReceiptHandle_AcceptsV2 to reflect the contract shift (rejection moved from API boundary to meta-aware validateReceiptHandleVersion).

  • go test -race ./adapter/... (targeted SQS scope) clean.
  • golangci-lint run ./adapter/... clean.

Self-review (per CLAUDE.md)

  1. Data loss — Dispatch helpers byte-identical on legacy queues (PartitionCount<=1 routes to legacy constructors). DLQ FIFO computes dlqPartition via partitionFor(dlqMeta, srcRec.MessageGroupId) so cross-queue redrive lands in the right partition. Receive fanout scans every partition. No issue.
  2. Concurrency — Receive fanout iterates partitions sequentially under one shared wall-clock + max budget, so the original per-call SLA is preserved. meta is loaded once and passed by pointer through helpers; no concurrent meta refresh in a single call. validateReceiptHandleVersion runs after the gen check so a SetQueueAttributes race cannot flip the answer (PartitionCount is immutable). go test -race clean.
  3. Performance — Hot path adds 1 nil-check + 1 PartitionCount compare per dispatch. Receive fanout iterates effectivePartitionCount(meta) times: 1 on legacy / perQueue, N on perMessageGroupId with N partitions splitting roughly 1/N of the messages each.
  4. Data consistency — Receipt handle's recorded partition matches the storage partition (commitReceiveRotation uses cand.partition for both newVisKey and the handle). Cross-version handles rejected as ReceiptHandleIsInvalid — no leak between keyspaces. perQueue + PartitionCount=4 invariant pinned by both unit and integration tests.
  5. Test coverage — 10 new tests across the contract surface. Existing legacy-path SQS tests pass unchanged.

Caller audit (semantic-change discipline)

decodeClientReceiptHandle semantics changed from "reject all v2" to "decode any version, defer version validation to meta-aware caller". All 3 production callers audited via grep:

  • parseQueueAndReceipt (sqs_messages.go:1576) → deleteMessageWithRetryloadMessageForDeletevalidateReceiptHandleVersion after meta load. ✅
  • DeleteMessageBatch entry (sqs_messages_batch.go:475) → deleteMessageWithRetry → same as above. ✅
  • ChangeMessageVisibilityBatch entry (sqs_messages_batch.go:575) → changeVisibilityWithRetryloadAndVerifyMessagevalidateReceiptHandleVersion. ✅

The dormancy guarantee (v2 → ReceiptHandleIsInvalid on non-partitioned) is preserved across every caller path.

Follow-ups

  • PR 5b-3 (next): atomic gate-lift + CreateQueue capability check via PollSQSHTFIFOCapability (feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a) #721). Removes the validatePartitionDormancyGate call and verifies cluster-wide HT-FIFO support before accepting PartitionCount > 1.
  • PR 6: partition-iterating reaper (drops the "legacy byAge only" caveat in buildReapOps).

…(Phase 3.D PR 5b-2)

Stage 2 of PR 5b: wire SendMessage / ReceiveMessage / DeleteMessage
/ ChangeMessageVisibility through the per-key dispatch helpers
landed in PR 5b-1. The section 11 PR 2 dormancy gate still rejects
PartitionCount > 1 at CreateQueue, so production queues remain on
the legacy keyspace and observable behaviour is byte-identical.
PR 5b-3 lifts the gate atomically with the capability check.

Wiring

  - SendMessage (sendMessage, sendBatchStandardOnce): partition is
    a compile-time 0 (Standard queues reject PartitionCount > 1
    via the cross-attribute validator). Dispatch helpers route to
    legacy on the steady-state path.
  - sendFifoMessage: hashes MessageGroupId once via partitionFor
    and threads the result through every key construction (data,
    vis, byage, dedup, group-lock).
  - ReceiveMessage: scanAndDeliverOnce wraps the scan loop with a
    per-partition fanout over effectivePartitionCount(meta)
    iterations. The shared wall-clock + per-call max budget caps
    apply across the whole call, not per-partition. sqsMsgCandidate
    carries the partition field stamped at scan time so downstream
    rotate / delete / expire helpers route to the right partition.
  - DeleteMessage / ChangeMessageVisibility: handle.Partition from
    the v2 receipt handle drives every key construction.

Receipt-handle dispatch

  encodeReceiptHandleDispatch(meta, partition, ...) is the single
  point that picks v1 vs v2. ReceiveMessage on a partitioned queue
  records cand.partition into the handle so DeleteMessage routes
  back to the same partition.

  decodeClientReceiptHandle is now version-agnostic; the queue-
  aware version check moved to validateReceiptHandleVersion which
  runs in loadMessageForDelete / loadAndVerifyMessage once meta is
  in scope. v1 on a partitioned queue and v2 on a non-partitioned
  queue both surface as ReceiptHandleIsInvalid -- preserving the
  dormancy promise (no v2 wire-format probability from the public
  API) under the new contract. The section 11 PR 2 dormancy gate
  still in force in this PR keeps every queue non-partitioned, so
  v2 handles still surface as ReceiptHandleIsInvalid downstream,
  the exact PR 5a observable behaviour.

Reaper

  Reaper iterates legacy byAge keys only -- partition-iterating
  enumeration ships in the later partition reaper PR. buildReapOps
  / reapOneRecord pass nil meta + partition 0 through dispatch
  helpers so the keys are byte-identical to the pre-PR-5b layout.

Tests (10 added)

  Unit tests (sqs_keys_dispatch_v2_test.go):
    - TestEncodeReceiptHandleDispatch_PicksVersionByPartitionCount:
      pins the version dispatch decision across nil/0/1/4 metas.
    - TestEncodeReceiptHandleDispatch_LegacyByteIdenticalToV1:
      protects the byte-identical guarantee on legacy queues.
    - TestEncodeReceiptHandleDispatch_PerQueueUsesV2: pins that
      perQueue + PartitionCount=4 still produces v2 handles
      (keyspace is partitioned even when partitionFor collapses
      every group to partition 0).
    - TestValidateReceiptHandleVersion_QueueAwareRules: 10
      sub-cases covering the (meta.PartitionCount x handle.Version)
      matrix, including perQueue + PartitionCount=4.
    - TestValidateReceiptHandleVersion_NilHandle: defensive nil.
    - TestValidateReceiptHandleVersion_RejectsV2OnNonPartitioned:
      named regression for the dormancy guarantee under the new
      contract.
    - TestSQSMsgVisScanBoundsDispatch_LegacyMatchesLegacy:
      byte-identical to legacy sqsMsgVisScanBounds on legacy meta.
    - TestSQSMsgVisScanBoundsDispatch_PartitionedUsesPartitionedPrefix:
      different partitions yield disjoint scan ranges.
    - TestSQSMsgVisScanBoundsDispatch_PerQueueOnPartitionedKeyspace:
      pins the PR 731 round 2 forward-note invariant -- perQueue
      + PartitionCount=4 keeps the partitioned vis prefix at
      partition 0; collapsing to legacy would silently strand
      send writes.

  Integration tests (sqs_partitioned_dispatch_test.go) install a
  partitioned meta directly on a queue created via the public API,
  short-circuiting the dormancy gate without disabling it for
  production CreateQueue:
    - TestSQSServer_PartitionedFIFO_SendReceiveDeleteRoundTrip:
      end-to-end smoke test (send 6 groups, receive surfaces all
      via fanout, every handle is v2, delete via v2 handle, queue
      is empty afterwards, legacy keyspace stays empty).
    - TestSQSServer_PartitionedFIFO_RejectsV1Handle: forged v1
      handle on a partitioned queue surfaces as ReceiptHandleIsInvalid
      via DeleteMessage and ChangeMessageVisibility.
    - TestSQSServer_PartitionedFIFO_PerQueueCollapsesToPartitionZero:
      perQueue + PartitionCount=4 receive surfaces every message
      in one fanout pass; every v2 handle records Partition=0.

  Updated TestDecodeClientReceiptHandle_RejectsV2 to
  TestDecodeClientReceiptHandle_AcceptsV2 to reflect the contract
  shift (rejection moved from API boundary to meta-aware
  validateReceiptHandleVersion).

Self-review (per CLAUDE.md)

1. Data loss -- Dispatch helpers byte-identical on legacy queues
   (PartitionCount<=1 routes to legacy constructors). DLQ FIFO
   computes dlqPartition via partitionFor(dlqMeta, srcRec.MessageGroupId)
   so cross-queue redrive lands in the right partition. Receive
   fanout scans every partition. No issue.
2. Concurrency -- Receive fanout iterates partitions sequentially
   under one shared wall-clock + max budget, so the original
   per-call SLA is preserved. meta is loaded once and passed by
   pointer through helpers; no concurrent meta refresh in a
   single call. validateReceiptHandleVersion runs after the gen
   check so a SetQueueAttributes race cannot flip the answer
   (PartitionCount is immutable). go test -race clean.
3. Performance -- Hot path adds 1 nil-check + 1 PartitionCount
   compare per dispatch. Receive fanout iterates effectivePartitionCount(meta)
   times: 1 on legacy / perQueue, N on perMessageGroupId, with
   N partitions splitting roughly 1/N of the messages each.
4. Data consistency -- Receipt handle's recorded partition matches
   the storage partition (commitReceiveRotation uses cand.partition
   for both newVisKey and the handle). Cross-version handles
   rejected as ReceiptHandleIsInvalid -- no leak between keyspaces.
   perQueue + PartitionCount=4 invariant pinned by both unit and
   integration tests.
5. Test coverage -- 10 new tests across the contract surface.
   Existing legacy-path SQS tests pass unchanged.

go test -race ./adapter/... and golangci-lint clean.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 18 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 22fae4d8-e6df-49e4-ac68-079f7cb19320

📥 Commits

Reviewing files that changed from the base of the PR and between c7ed1a1 and 443ceb5.

📒 Files selected for processing (9)
  • adapter/sqs_fifo.go
  • adapter/sqs_keys_dispatch.go
  • adapter/sqs_keys_dispatch_v2_test.go
  • adapter/sqs_messages.go
  • adapter/sqs_messages_batch.go
  • adapter/sqs_partitioned_dispatch_test.go
  • adapter/sqs_reaper.go
  • adapter/sqs_receipt_handle_v2_test.go
  • adapter/sqs_redrive.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-htfifo-dispatch-wiring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 25 minutes and 18 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 8m 28s —— View job


Code Review: feat(sqs): wire partitioned-FIFO data plane through dispatch helpers (Phase 3.D PR 5b-2)

  • Gather PR context and understand the request
  • Read changed files (9 adapter files, ~900 additions / ~160 deletions)
  • Analyze dispatch helpers and wiring logic
  • Review test coverage
  • Check for data loss, concurrency, and consistency issues
  • Post complete review

Overall assessment

This is a well-structured, carefully staged PR. The core abstraction — a single dispatch layer (sqs_keys_dispatch.go) that routes to legacy or partitioned key constructors based on meta.PartitionCount — is the right shape, and the dormancy guarantee (no observable change on production queues) holds cleanly. The self-review in the PR description is thorough and accurate on all five CLAUDE.md lenses. There is one medium-severity finding and a few minor notes.


Medium: validateReceiptHandleVersion does not bound-check handle.Partition against meta.PartitionCount

File: adapter/sqs_keys_dispatch.go:165–196

validateReceiptHandleVersion verifies that the handle's version matches the queue's partition mode, but it does not verify that the partition index in a v2 handle is within [0, meta.PartitionCount).

A client that receives a legitimate v2 handle (partition = N) could forge a new v2 handle for the same queue with an out-of-range partition (e.g., partition = 63 on a 4-partition queue). After passing the version check, loadMessageForDelete constructs a key for partition 63, gets ErrKeyNotFound, and returns sqsDeleteNoOp — silently succeeding. This is AWS-idempotent but slightly incorrect semantics for a forged handle. More importantly, it leaks structural information: the client can confirm partition-key-space boundaries exist at that queue by observing the silent success vs. an explicit invalid-handle error.

// Suggested addition in validateReceiptHandleVersion, after the version checks:
if meta != nil && meta.PartitionCount > 1 && handle.Partition >= meta.PartitionCount {
    return errors.New("receipt handle partition out of range")
}

Since PartitionCount is immutable after queue creation (enforced by validatePartitionImmutability), this check is race-safe. The dormancy gate in PR 5b-2 means this can't be triggered today, but it should be in place before the gate is lifted in PR 5b-3.

Fix this →


Minor observations

1. now is computed per-partition in scanAndDeliverPartition (sqs_messages.go:901)

Before this PR, now was computed once in scanAndDeliverOnce. After, it is computed fresh at the start of each scanAndDeliverPartition call, meaning a message that crosses its visibility threshold between partitions 0 and 1 during a fanout receive might be delivered in partition 1's scan but would not have been found in a single-partition equivalent. This is strictly a behavior improvement (AWS does not guarantee anything stronger than eventual delivery anyway), but it is a subtle semantic delta worth noting in the PR description for future readers.

2. Integration test's legacy-keyspace probe end key uses 4 bytes of 0xFF (sqs_partitioned_dispatch_test.go:184)

end = append(end, 0xFF, 0xFF, 0xFF, 0xFF)

The actual message-ID suffix in the key is a base64url-encoded 16-byte UUID (~22 chars). The 4-byte 0xFF suffix works correctly because all base64url characters (A-Z, a-z, 0-9, -, _) have byte values < 0xFF, so any base64url string of any length compares less than [0xFF 0xFF 0xFF 0xFF]. However, using prefixScanEnd(legacyDataPrefix) (already available in the codebase at adapter/prefix_scan.go) would make the intent explicit and match the pattern used in the rest of the reaper code. Not a bug, just a consistency nit.

3. sqsMsgVisScanBoundsDispatch duplicates the scan-bounds logic from sqsMsgVisScanBounds (sqs_keys_dispatch.go:128–141)

The two functions share identical logic (construct prefix, zero-pad start, increment upper bound for end). A minor refactor opportunity: sqsMsgVisScanBoundsDispatch could delegate to sqsMsgVisScanBounds on the legacy branch rather than re-implementing the bounds math inline. The test TestSQSMsgVisScanBoundsDispatch_LegacyMatchesLegacy catches any divergence, so this is low risk either way.


Correctness verification (key paths)

Path Partition source Consistent with send?
sendFifoMessage key construction partitionFor(meta, in.MessageGroupId)
commitReceiveRotation vis-key + handle cand.partition (stamped at scan)
buildDeleteOps vis/byage/group-lock keys handle.Partition (v2 handle)
changeVisibilityWithRetry old/new vis key handle.Partition (v2 handle)
buildRedriveOps DLQ keys partitionFor(dlqMeta, dlqRec.MessageGroupId)
Reaper (buildReapOps) nil meta + partition 0 → legacy only ✅ (intentional, PR 6 will extend)
validateReceiptHandleVersion version-only check ⚠️ partition bounds not checked

Snapshot consistency: readTS is computed once in scanAndDeliverOnce and threaded identically through all partition fanout calls — consistent multi-partition snapshot. ✅

Group-lock partition coherence: partitionFor is deterministic (FNV-1a, power-of-two mask); a given MessageGroupId maps to exactly one partition for the lifetime of a queue generation. The group lock is always written and read at that same partition. ✅

PartitionCount immutability: validatePartitionImmutability is enforced on SetQueueAttributes, making the meta-dispatch pattern race-safe without re-loading meta inside OCC transactions. ✅

Dormancy preservation: Every code path that could emit a v2 handle is gated on meta.PartitionCount > 1, which CreateQueue continues to reject via validatePartitionDormancyGate until PR 5b-3. ✅


Test coverage

10 new tests are well-targeted. Particular highlights:

  • TestSQSMsgVisScanBoundsDispatch_PerQueueOnPartitionedKeyspace directly pins the forward-note invariant from PR 731 round 2 (perQueue + partitioned keyspace must scan partitioned prefix, not legacy).
  • TestSQSServer_PartitionedFIFO_SendReceiveDeleteRoundTrip sanity-checks the legacy keyspace is empty after all operations, which is the cleanest possible regression gate for future dispatch regressions.
  • TestDecodeClientReceiptHandle_AcceptsV2 correctly documents the contract shift.

Missing: a test that a v2 handle with Partition >= meta.PartitionCount is rejected (tracking the medium finding above).


Verdict: Ready to merge after the partition bounds check is added to validateReceiptHandleVersion and covered by a test. The rest of the implementation is sound.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the data plane wiring for partitioned FIFO queues. It introduces dispatch helpers to route key construction, receipt handle encoding, and version validation based on the queue's partition count. Updates were made to message sending, receiving, deletion, and redriving logic to support these partitions. A review comment pointed out an improvement opportunity in the integration tests to use a more robust helper for calculating prefix scan bounds instead of manually appending bytes.

Comment on lines +181 to +184
// Cap the prefix scan at the generation byte so we do not
// drag in unrelated queues.
end := append([]byte(nil), legacyDataPrefix...)
end = append(end, 0xFF, 0xFF, 0xFF, 0xFF)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for creating the end key for the prefix scan is incorrect. Appending 0xFF bytes does not correctly create an exclusive upper bound for all cases, especially when the prefix itself contains 0xFF bytes, which could cause the scan to read into adjacent key ranges (e.g., a different queue generation). This could lead to test flakiness or incorrect test behavior.

A more robust way to create an end key for a prefix is to find the next lexicographical key after the prefix. The codebase already has a helper for this, prefixScanEnd. You should use it here for correctness and consistency.

Suggested change
// Cap the prefix scan at the generation byte so we do not
// drag in unrelated queues.
end := append([]byte(nil), legacyDataPrefix...)
end = append(end, 0xFF, 0xFF, 0xFF, 0xFF)
// The end key for a prefix scan is the smallest key that does not
// have the prefix. This correctly bounds the scan to only keys
// within the given queue and generation.
end := prefixScanEnd(legacyDataPrefix)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 443ceb5f9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_messages.go
Comment on lines +873 to +875
for partition := uint32(0); partition < partitions; partition++ {
if len(delivered) >= opts.Max {
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid fixed partition-0 priority in receive fanout

When PartitionCount > 1, this loop always starts at partition 0 and exits as soon as opts.Max is filled, so repeated ReceiveMessage calls can permanently starve higher-index partitions whenever partition 0 has a sustained backlog. This only appears on partitioned queues, but once enabled it can make some message groups effectively undeliverable under load because later partitions are never scanned.

Useful? React with 👍 / 👎.

return errors.New("receipt handle version mismatch for non-partitioned queue")
}
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject out-of-range v2 receipt-handle partitions

The queue-aware version check now accepts any v2 handle on partitioned queues but never validates handle.Partition < meta.PartitionCount. A forged v2 handle with an out-of-range partition is then used to build partitioned keys in delete/change-visibility paths; the partition resolver fails closed for unknown partitions, which bubbles up as a generic internal error instead of ReceiptHandleIsInvalid. This is reachable once partitioned queues are enabled.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant