Skip to content

feat(sqs): Phase 3 — Approximate counters + admin SPA queues pages#659

Closed
bootjp wants to merge 4 commits intomainfrom
feat/sqs-phase3-counters-console
Closed

feat(sqs): Phase 3 — Approximate counters + admin SPA queues pages#659
bootjp wants to merge 4 commits intomainfrom
feat/sqs-phase3-counters-console

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 26, 2026

Summary

Phase 3.A and 3.E of docs/design/2026_04_24_partial_sqs_compatible_adapter.md (renamed from _proposed_ in this PR — Phase 1+2 already landed). Phase 3.B / 3.C / 3.D are recorded as TODOs in §16.4 / §16.5 / §16.6 of the same doc with the rough scope each needs (XML query protocol, per-queue throttling, split-queue FIFO).

3.A — Approximate counters on GetQueueAttributes

GetQueueAttributes now returns ApproximateNumberOfMessages, ApproximateNumberOfMessagesNotVisible, and ApproximateNumberOfMessagesDelayed (previously: none of the three). Implementation walks the visibility index at the read TS and bucket-classifies each entry by (visible_at, available_at) vs now:

  • Visible = available_at ≤ now ∧ visible_at ≤ now — ready for the next ReceiveMessage.
  • NotVisible = available_at ≤ now ∧ visible_at > now — in flight after a receive bumped vis.
  • Delayed = available_at > now — sent with DelaySeconds, not yet eligible.

Bounded at 5,000 vis-index entries per call so a runaway queue cannot turn GetQueueAttributes into an O(stream) scan; over the budget the counts are reported as a lower bound and Truncated=true so callers can log. The counters fire only when the caller asks for them or for All — config-only callers still pay zero scan cost (matches AWS's "approximate" contract and keeps the existing GetQueueAttributes callers cheap).

3.E — Admin SPA queues pages (Section 13 superseded)

The original Section 13 console UI design (separate --consoleAddress listener, bearer token, dedicated go:embed bundle) is superseded — see §16.2 of the design doc for the rationale. Operator UI for SQS lives on the existing admin dashboard SPA (#649) instead, reusing its JWT cookie session, CSRF double-submit, role model, and embed.FS.

Backend (adapter/sqs_admin.go + internal/admin/sqs_handler.go):

  • SQSServer.AdminListQueues / AdminDescribeQueue / AdminDeleteQueue are SigV4-bypass entrypoints, mirroring the AdminListTables / AdminDescribeTable / AdminDeleteTable pattern.
  • sqsQueuesBridge in main_admin.go re-shapes adapter.AdminQueueSummaryadmin.QueueSummary, keeping internal/admin free of the heavy adapter dependency tree.
  • admin.QueuesSource is opt-in; deployments that don't run --sqsAddress leave /admin/api/v1/sqs/* off the wire and the SPA renders a soft "endpoint pending" notice on the 404.
  • Role re-evaluation against the live RoleStore on DELETE so a downgraded key cannot keep mutating with a still-valid JWT.
  • buildAPIMux dispatch refactored into apiRoutes.dispatch + hasResourcePrefix to keep cyclop ≤ 10 as new resource prefixes get added.

Frontend (web/admin/src/pages/Sqs{List,Detail}.tsx):

  • /sqs queue list with refresh + per-row link.
  • /sqs/:name detail showing FIFO badge, counters card (visible / in flight / delayed) with a "truncated" pill when the scan budget capped, raw attributes table, and a Delete confirmation Modal gated by RequireFullAccess.
  • api/client.ts gains listQueues / describeQueue / deleteQueue with the same AbortSignal pattern used for cluster / dynamo reads.

Out of scope (recorded in docs/design/...partial_sqs_compatible_adapter.md §16.2)

Test plan

  • go build ./...
  • go test -race ./internal/admin/...
  • go test -race -run TestSQS ./adapter/
  • go test -run TestStartAdmin .
  • golangci-lint run ./adapter/... ./internal/admin/... ./...0 issues.
  • cd web/admin && npm run build && npm run lint — strict TS, no //nolint
  • Manual smoke: start a node with --sqsAddress :4566 --adminEnabled --adminAllowInsecureDevCookie, create a queue, send N + delay one + receive K, hit /admin/sqs/<name> → counters match, Delete dialog returns to list.

Self-review (5 lenses)

  1. Data loss — read-only counter scan + Delete passthrough. Delete reuses the existing deleteQueueWithRetry OCC path.
  2. Concurrency — counter scan is best-effort + capped; AdminDeleteQueue re-checks leader before each attempt; admin role re-evaluated per request.
  3. Performance — counters bounded at 5K vis-entries per call, only fire when the attribute is requested. Bundle +1 KB gzip vs fix(admin/spa): drop redundant abort guard and add Modal a11y #650 baseline.
  4. Data consistency — counters use a single nextTxnReadTS and bucket from one now. The "approximate" contract matches AWS.
  5. Test coverage — two new regression tests around GetQueueAttributes cover both happy path and the "only when requested" property. Admin handler tests stay green via the nil Queues argument added to the existing startAdminServer call sites.

Stacking

This PR is stacked on #650 (Modal a11y + abort guard cleanup), which is itself stacked on #649 (admin SPA). Merge order: #649#650 → this. Once the chain merges, this rebases onto main with no conflicts.

bootjp added 4 commits April 26, 2026 17:57
Phase P3 of docs/design/2026_04_24_proposed_admin_dashboard.md.
The Go side (auth, JWT, cluster handler, dynamo handler, follower->
leader forwarding, embed.go skeleton) shipped in earlier PRs but
StaticFS was wired as nil, leaving /admin/* answering 404 for any
non-API path. This PR adds the React bundle + flips StaticFS on.

Source layout (web/admin/):
- React 18 + TypeScript + Vite per design doc Section 5.1
- Tailwind CSS for tokens; no UI framework dependency (kept the
  bundle at ~60 KB gzip per Section 7.2's ~150 KB target)
- vite.config.ts writes the build output straight into
  internal/admin/dist so `npm run build && go build` produces a
  single binary with the latest SPA

Pages (Section 5.2):
- /login - SigV4 access-key + secret-key form, server-side JWT
  Cookie issued, status-specific error messages for 401/403/429
- / (dashboard) - cluster overview, raft groups, summary cards
- /dynamo - table list + Create dialog (full role only)
- /dynamo/:name - schema + GSIs + Delete confirmation modal
- /s3 - bucket list + Create dialog (endpoint pending, renders
  graceful "endpoint pending" notice on 404)
- /s3/:name - bucket meta + ACL toggle + Delete (same)

Auth (Section 6):
- Cookie-based session (admin_session, HttpOnly) handled by browser
- CSRF double-submit: admin_csrf cookie value echoed in
  X-Admin-CSRF header on every POST/PUT/DELETE (apiFetch in
  src/api/client.ts)
- Read-only sessions hide Create/Delete affordances and surface a
  RequireFullAccess notice; backend re-evaluates the role
- Session expiry tracked in sessionStorage; expires_at trip clears
  in-memory state and redirects to /login

Backend wiring:
- main_admin.go: load admin.StaticFS() and pass to ServerDeps so
  /admin/assets/* and the SPA fallback start serving real files
- internal/admin/embed.go: //go:embed all:dist on the build output
- .gitignore: keep the placeholder index.html committed (so
  go:embed always has at least one file in a fresh clone), exclude
  hashed Vite assets and the npm/vite caches - committed bundles
  invariably drift from source

Verification:
- npm run build: 192 KB JS / 60 KB gzip + 14 KB CSS
- npm run lint (tsc strict, noUnusedLocals, noUncheckedSideEffects)
- go build ./...
- go test -race ./internal/admin/...
- golangci-lint run ./internal/admin/...

S3 admin endpoints (CreateBucket / ListBuckets / PutBucketAcl /
DeleteBucket) are not yet wired on the Go side; the SPA pages render
a soft "endpoint pending" notice on the 404 they currently get and
will populate transparently once the handlers ship.
Two converging review findings (Claude bot + Gemini Code Assist):

1. AbortSignal not plumbed through listTables / describeTable /
   listBuckets / describeBucket. useApiQuery created an
   AbortController and passed signal to the loader, but those four
   methods discarded it - so requests ran to completion after a
   component unmounted, wasting network and server connections. Now
   every read method mirrors cluster's signature (`signal?: AbortSignal`)
   and every page-level useApiQuery callsite passes it through.

2. The safe() wrapper in Dashboard.tsx was a no-op that received
   `signal` and never passed it on. The "404 doesn't cancel render"
   suppression it claimed to provide is already handled by SummaryCard
   testing `error?.status === 404 && pendingMessage`. Removed.

Drive-by polish from the same reviews:

- useApiQuery's effect deps no longer include `markUnauthorized`.
  Read it through markUnauthorizedRef so an AuthProvider remount
  (React Fast Refresh in dev, or any future wrapping change) does
  not invalidate every active query and trigger a fresh round-trip
  on every page. Mirrors the existing loaderRef pattern.

- Login.tsx now trims `secretKey` in addition to `accessKey`. SigV4
  secrets never contain whitespace, so trimming is safe and saves
  an operator from a cryptic 401 after pasting a key with a
  trailing newline.

Modal accessibility (no focus trap, missing aria-modal / role=dialog)
was flagged as low-priority polish for an internal operator tool.
Tracked as a follow-up; not addressed here to keep this change
focused on the functional bugs.

Verified with `npm run build` and `npm run lint` (tsc --strict
--noUnusedLocals --noUncheckedSideEffectImports), and `go build ./...`.
Two follow-ups from the review of #649:

1. useApi.ts: removed the `if (ctrl.signal.aborted) return;` check
   inside the catch handler. The cleanup function sets
   `cancelled = true` *before* calling `ctrl.abort()`, so any abort
   path is already covered by the preceding `cancelled` check. The
   second guard was harmless belt-and-suspenders dead code; the
   comment now records why one check is enough.

2. Modal.tsx: added the WAI-ARIA dialog contract (`role="dialog"`,
   `aria-modal="true"`, `aria-labelledby` pointing at the title)
   and a focus trap:
   - On open: store the previously-focused element, then focus the
     first focusable inside the dialog so keyboard users do not stay
     stranded on the page underneath. Empty dialogs fall back to
     focusing the dialog container itself with tabindex=-1.
   - Tab / Shift+Tab cycle within the dialog instead of escaping to
     the page underneath.
   - On close: restore focus to whoever opened the dialog (skipped
     when that element has been unmounted, e.g. the dialog deleted
     the row whose button opened it).
   - Escape and backdrop click stay wired through the existing
     `busy` gate so a half-applied operation cannot be dismissed.

   Implementation is vanilla DOM querying — focus-trap-react /
   react-focus-lock would have added 5–7 KB gzip for what is a
   ~40 line vanilla solution. Bundle delta from this PR is +0.4 KB
   gzip (60.23 -> 60.59 KB).

Verified with `npm run build` and `npm run lint` (tsc --strict
--noUnusedLocals --noUncheckedSideEffectImports), and `go build ./...`.

Stacked on top of #649 (feat/admin-dashboard-spa); merge that one
first, then this rebases cleanly onto main.
Phase 3.A and 3.E from the SQS design doc, lifecycle-bumped from
proposed_ to partial_. Phase 3.B / 3.C / 3.D are recorded as TODOs
in the same doc with the rough scope each needs (XML query protocol,
per-queue throttling, split-queue FIFO).

3.A — ApproximateNumberOfMessages{,NotVisible,Delayed} on
GetQueueAttributes. The handler scans the visibility index at the
read TS and classifies each entry by (visible_at, available_at) vs
now. Visible (available and vis<=now), NotVisible (in flight after
receive bumped vis), Delayed (available_at>now, never received).
Bounded at 5000 vis-index entries per call so a runaway queue
cannot turn GetQueueAttributes into an O(stream) scan; the
"truncated" signal is reported back so callers can log.

The counters fire only when the caller asks for any of the three
or for "All" — a config-only GetQueueAttributes call still pays
zero scan cost. Two regression tests pin both behaviors:

  - TestSQSServer_GetQueueAttributesApproxCounters
  - TestSQSServer_GetQueueAttributesApproxCountersOnlyWhenSelected

3.E — Admin SPA queues pages. The original Section 13 console UI
(separate listener, bearer token, dedicated bundle) is superseded
by extending the existing admin dashboard SPA. Reuses the JWT
cookie session, CSRF double-submit, role model, and embed.FS that
the dashboard already enforces — no second listener, no second
auth surface.

Backend (adapter/sqs_admin.go + internal/admin/sqs_handler.go):

  - SQSServer.AdminListQueues / AdminDescribeQueue / AdminDeleteQueue
    are SigV4-bypass entrypoints, mirroring the DynamoDB pattern.
    AdminDescribeQueue returns the queue meta plus the typed
    counters from §3.A (no string round-trip).
  - sqsQueuesBridge in main_admin.go re-shapes the adapter DTO
    into the admin package's QueueSummary, keeping internal/admin
    free of the heavy adapter dependency tree.
  - admin.QueuesSource interface is opt-in; deployments that don't
    run --sqsAddress leave /admin/api/v1/sqs/* off the wire and
    the SPA renders a soft "endpoint pending" notice on the 404.
  - Role re-evaluation against the live RoleStore on DELETE so a
    downgraded key cannot keep mutating with a still-valid JWT.
  - buildAPIMux dispatch loop refactored into apiRoutes.dispatch +
    hasResourcePrefix helper to keep cyclop <= 10 as new resource
    prefixes get added (3.E here, future endpoints later).

Frontend (web/admin/src/pages/Sqs{List,Detail}.tsx):

  - /sqs queue list with refresh + per-row link to detail.
  - /sqs/:name detail showing FIFO badge, counters card (visible /
    in-flight / delayed) with a "truncated" pill when the scan
    budget capped, raw attributes table, and a Delete confirmation
    Modal gated by RequireFullAccess.
  - api/client.ts gains listQueues / describeQueue / deleteQueue
    with the same AbortSignal pattern used for cluster / dynamo
    reads (Modal a11y / abort handling carry over from PR #650).

Out of scope (recorded in the design doc):

  - PurgeQueue from the SPA — the underlying adapter PurgeQueue
    method ships in PR #638 (still open). Trivial to add as a
    follow-up once that lands.
  - Send / Peek from the SPA — Send needs an SigV4-bypass
    entrypoint; Peek is genuinely new functionality (SQS has no
    peek primitive) and merits its own design discussion.
  - CreateQueue from the SPA — modest backend addition deferred to
    a separate PR to keep this one focused.

Verified with `go build ./...`, `go test -race ./internal/admin/...`,
`go test -race -run TestSQS ./adapter/`, `go test -run TestStartAdmin .`,
`golangci-lint run ./adapter/... ./internal/admin/... ./...`,
`cd web/admin && npm run build && npm run lint`.

Bundle delta: 199 KB JS / 61 KB gzip (+1 KB vs PR #650 baseline).

Stacked on PR #650 (admin SPA followups) which is itself stacked on
PR #649 (admin SPA). Merge order: #649#650 → this. Once the
chain merges this rebases onto main with no conflicts.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cb4b7ea7-4980-4eda-9207-01ce12025568

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-phase3-counters-console

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 3.A of the SQS-compatible adapter, introducing approximate message counters for the GetQueueAttributes API and the admin dashboard. It also adds the admin SPA frontend and wires up management endpoints for listing, describing, and deleting queues. Feedback highlights a performance risk in the counter computation logic, which may perform up to 5,000 random reads, and a consistency bug where message classification uses the current system time instead of a timestamp derived from the MVCC readTS.

Comment thread adapter/sqs_counters.go
}
// visible_at > now: either Delayed or NotVisible. Load the
// data record so we can distinguish on available_at.
state, err := s.classifyHiddenCandidate(ctx, queueName, gen, string(kvp.Value), now, readTS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The computeApproxCounters implementation performs a point lookup (GetAt) for every message in the visibility index that is not immediately visible (i.e., in-flight or delayed). With a budget of 5,000 entries, a single GetQueueAttributes call could trigger up to 5,000 random reads. While this is bounded, it can significantly increase latency for standard API callers and monitoring tools if the data records are not in the block cache. Consider if AvailableAtMillis can be included in the visibility index value to allow classification without point lookups, or document the potential latency impact for large queues.

Comment thread adapter/sqs_counters.go
// bound legal.
func (s *SQSServer) computeApproxCounters(ctx context.Context, queueName string, gen uint64, readTS uint64) (sqsApproxCounters, error) {
var out sqsApproxCounters
now := time.Now().UnixMilli()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.Now().UnixMilli() for bucket classification while scanning at a specific readTS can lead to inconsistencies. In an MVCC system, all operations within a single transaction or snapshot scan must share the same start timestamp to ensure atomicity and a consistent snapshot view. Using a timestamp derived from the HLC at readTS would ensure the classification is strictly consistent with the snapshot view.

References
  1. In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Superseded by #670. The PR #659 branch became unrebaseable after main moved (PR #649 squash-merged, PR #658 added S3 admin endpoints, and the Approximate counters now live in adapter/sqs_catalog.go directly via scanApproxCounters — duplicating my old computeApproxCounters). The new PR re-applies only the SQS admin entrypoints + SPA queues pages on top of fresh main.

@bootjp bootjp closed this Apr 26, 2026
bootjp added a commit that referenced this pull request Apr 26, 2026
Replaces PR #659, which conflicted heavily after main moved (PR #649
squashed; PR #658 added S3 admin endpoints; the Approximate counters
implementation now lives directly in adapter/sqs_catalog.go).

This PR:

Backend (adapter/sqs_admin.go + internal/admin/sqs_handler.go):
- SQSServer.AdminListQueues / AdminDescribeQueue / AdminDeleteQueue
  are SigV4-bypass entrypoints, mirroring the AdminListTables /
  AdminListBuckets pattern.
- AdminDescribeQueue uses the existing scanApproxCounters from
  sqs_catalog.go (already on main) so the admin path returns the
  same Visible / NotVisible / Delayed numbers as
  GetQueueAttributes("All") would, taken at one snapshot read TS.
- sqsQueuesBridge in main_admin.go re-shapes
  adapter.AdminQueueSummary into admin.QueueSummary, keeping
  internal/admin free of the heavy adapter dependency tree —
  same pattern as dynamoTablesBridge / s3BucketsBridge.
- admin.QueuesSource is opt-in; deployments that don't run
  --sqsAddress leave /admin/api/v1/sqs/* off the wire and the
  SPA renders a soft "endpoint pending" notice on the 404.
- Role re-evaluation against the live RoleStore on DELETE so a
  downgraded key cannot keep mutating with a still-valid JWT.
- apiRouteTable.dispatch refactored: resourceHandlerFor extracted
  so the dispatcher stays under cyclop=10 as new resources land
  (Dynamo, S3, SQS, future).

Frontend (web/admin/src/pages/SqsList.tsx, SqsDetail.tsx):
- /sqs queue list with refresh + per-row link to detail.
- /sqs/:name detail showing FIFO badge, counters card (Visible /
  In-flight / Delayed), raw attributes table, and a Delete
  confirmation Modal gated by RequireFullAccess.
- api/client.ts gains listQueues / describeQueue / deleteQueue
  with the same AbortSignal pattern used for cluster / dynamo /
  s3 reads.
- Layout nav adds an SQS tab between DynamoDB and S3.

Out of scope (recorded in the SQS partial design doc §16.2):
- PurgeQueue from the SPA. Underlying purgeQueueWithRetry is on
  main; the admin entrypoint is a trivial follow-up.
- Send / Peek / CreateQueue from the SPA. Each needs its own
  adapter entrypoint and form UX; deferred to keep this PR
  focused.

Verified with go build ./..., go test -race ./internal/admin/...,
go test -race -run TestSQS ./adapter/, go test -run TestStartAdmin .,
golangci-lint run ./adapter/... ./internal/admin/... ./...
(0 issues, no //nolint), and cd web/admin && npm run build.
bootjp added a commit that referenced this pull request Apr 26, 2026
## Summary

Phase **3.B** of
[`docs/design/2026_04_24_partial_sqs_compatible_adapter.md`](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_partial_sqs_compatible_adapter.md)
§16.4. Adds the AWS SQS **query protocol** (form-encoded request, XML
response) alongside the existing JSON protocol on the same listener —
older `aws-sdk-java` v1, `boto < 1.34`, and AWS CLI clients can now talk
to elastickv without modification. Detection happens per-request via
`Content-Type` / `X-Amz-Target` / `Action` presence; no flag, no
separate port. See the new proposal doc committed alongside.

This is an **architectural proof PR**: dispatch + decoding + encoding +
error envelope + the `*Core` refactor pattern are all in place, with
**three verbs** wired end-to-end as concrete proof. Each follow-up verb
is a parser + response struct + one switch arm — no further design work
needed.

### Verbs in this PR

| Verb | Why it's in the proof set |
|---|---|
| `CreateQueue` | Exercises the `Attribute.N.{Name,Value}`
indexed-collection parser. |
| `ListQueues` | Exercises the repeated-element XML response shape. |
| `GetQueueUrl` | Exercises the `<ErrorResponse>` envelope path via
`QueueDoesNotExist`. |

Every other verb returns a structured **501 `NotImplementedYet`** XML
envelope so operators see the gap explicitly. `SendMessage` /
`ReceiveMessage` / `DeleteMessage` are the highest-priority follow-ups
(they need the same `*Core` refactor on the FIFO send loop).

### Key design points

- **No new listener / no flag.** `pickSqsProtocol(*http.Request)`
decides per request. JSON and Query share the SQS port and the SigV4
path.
- **Wire-format-free cores.** `createQueue` / `listQueues` /
`getQueueUrl` are now `decode → core → encode` with `core(ctx, in) (out,
error)`. The JSON wrappers are unchanged in behavior; existing JSON
tests pass without modification.
- **DoS protection inherited.** Body read is bounded by the same
`sqsMaxRequestBodyBytes` the JSON path uses.
- **SigV4 unchanged.** The signed canonical request includes the
form-encoded body, so the existing SigV4 middleware verifies query
requests without code changes.
- **Error parity.** `<Code>` reuses the existing `sqsErr*` constants.
HTTP status mirrors what the JSON path returns, so SDK retry classifiers
work across protocols.
- **Cyclomatic budget honoured.** `handle()` was refactored to extract
`handleQueryProtocol` — `cyclop ≤ 10` per project rules, no `//nolint`.

### Known limitation (design §11.4)

`proxyToLeader`'s error writer always emits the JSON envelope, so a
query-protocol client hitting a follower during a leader flip sees one
JSON error before retry lands on the new leader. Follow-up PR threads
the detected protocol onto the request context so the proxy emits
matching XML.

## Test plan

- [x] `go build ./...` — clean
- [x] `go test -count=1 -race -run
"TestSQS|QueryProtocol|TestPickSqs|TestCollectIndexedKV|TestWriteSQSQueryError"
./adapter/` — passes
- [x] `golangci-lint run ./adapter/...` — `0 issues.`
- [x] `pickSqsProtocol` table tests cover documented edge cases (header
precedence, charset suffix, GET with Action, missing Action, nil
request).
- [x] `collectIndexedKVPairs` tests cover happy path, orphan Name, empty
input, unrelated prefix.
- [x] End-to-end via the in-process listener: CreateQueue / ListQueues /
GetQueueUrl round-trip on the query side.
- [x] **Cross-protocol parity**: a queue created via Query is visible
via JSON `GetQueueUrl` with the same URL.
- [x] Error envelope: 4xx maps to `<Type>Sender</Type>`, 5xx to
`<Type>Receiver</Type>`, namespace pinned, `x-amzn-ErrorType` header
set.
- [x] Unknown verb returns 501 with the `NotImplementedYet` XML
envelope.
- [x] Missing `Action` parameter returns 400 (per design §3).

## Self-review (5 lenses)

1. **Data loss** — Wire-format change only. Cores are byte-for-byte
identical to the previous handler bodies; no Raft / OCC / MVCC code is
touched.
2. **Concurrency** — No new shared state. Detection is request-local.
Body parsing is bounded.
3. **Performance** — One additional `Content-Type` string compare per
request on the dispatch hot path. Negligible.
4. **Data consistency** — `*Core` returns the same business-logic
outputs as before; the JSON tests are the regression net for parity.
Cross-protocol parity test pins behaviour.
5. **Test coverage** — 10 new test cases cover detection, parsing,
envelope shape, and three end-to-end verbs. Existing `TestSQS*` race
suite passes on the refactor.

## Stacking

This PR is **independent** — branched from current `main` (which has
#638 + #649 merged). It does not depend on PR #650 / PR #659. Merge
whenever ready.
bootjp added a commit that referenced this pull request Apr 26, 2026
)

## Summary

**Replaces #659**, which has unresolvable conflicts now that main has
moved on (PR #649 squashed into main; PR #658 added the S3 admin
endpoints; the Approximate counters implementation now lives directly in
`adapter/sqs_catalog.go` via `scanApproxCounters`). Rather than a
multi-day rebase, this PR re-applies the unique SQS admin code on a
fresh branch off current main.

What survived from #659:
- `adapter/sqs_admin.go` — `SQSServer.AdminListQueues` /
`AdminDescribeQueue` / `AdminDeleteQueue` (SigV4-bypass entrypoints,
same shape as `AdminListTables` / `AdminListBuckets`).
- `internal/admin/sqs_handler.go` — HTTP handler for
`/admin/api/v1/sqs/queues{,/{name}}` with role re-evaluation on DELETE.
- `web/admin/src/pages/SqsList.tsx` / `SqsDetail.tsx` — SPA pages for
the queues view + delete confirmation.

What changed during the re-apply:
- `AdminQueueCounters` is now `int64` (matches `sqsApproxCounters` from
main; bridge does no width conversion).
- `AdminDescribeQueue` calls main's `scanApproxCounters` instead of the
duplicate `computeApproxCounters` from the old branch — same numeric
output, single implementation.
- Dropped the `CountersTruncated` field; main's counter type doesn't
expose truncation. SPA's "truncated" pill came out with it.
- `apiRouteTable.dispatch` refactored to extract `resourceHandlerFor` so
the dispatcher stays under cyclop=10 as new resources land.

## Backend

- Re-evaluates the principal's role against the live `MapRoleStore` on
every `DELETE` so a downgraded key cannot keep mutating with a
still-valid JWT (Codex P1 pattern from earlier admin PRs).
- `admin.QueuesSource` is **opt-in**: deployments without `--sqsAddress`
leave `/admin/api/v1/sqs/*` off the wire; the SPA renders a soft
"endpoint pending" notice on the 404, mirroring the Tables / Buckets
`nil` contract.
- The bridge in `main_admin.go` (`sqsQueuesBridge`,
`convertAdminQueueSummary`, `translateAdminQueuesError`) keeps
`internal/admin` free of the heavy adapter dependency tree, same
architectural pattern as Dynamo and S3.

## Frontend

- **/sqs** queue list with refresh + per-row link to detail.
- **/sqs/:name** detail showing FIFO badge, counters card (Visible /
In-flight / Delayed), raw attributes table, and a Delete confirmation
`Modal` gated by `RequireFullAccess`.
- `api/client.ts` gains `listQueues` / `describeQueue` / `deleteQueue`
with the same `AbortSignal` pattern used for `cluster` / `dynamo` / `s3`
reads.
- Layout nav adds an SQS tab between DynamoDB and S3.

## Out of scope (recorded in
`docs/design/2026_04_24_proposed_sqs_compatible_adapter.md` Section 14,
deferred per the SQS partial doc §16.2)

- **PurgeQueue from the SPA** — the underlying `purgeQueueWithRetry`
adapter method is on main; the admin entrypoint is a trivial follow-up.
- **Send / Peek / CreateQueue from the SPA** — each needs its own
SigV4-bypass adapter entrypoint and form UX; deferred to keep this PR
focused.

## Test plan

- [x] `go build ./...` — clean
- [x] `go test -race ./internal/admin/...` — passes
- [x] `go test -race -run TestSQS ./adapter/` — passes
- [x] `go test -run TestStartAdmin .` — passes
- [x] `golangci-lint run ./adapter/... ./internal/admin/... ./...` — `0
issues.`, no `//nolint`
- [x] `cd web/admin && npm run build` — 49 modules, 199 KB JS / 61 KB
gzip + 14.7 KB CSS
- [ ] Manual smoke (after PR lands): start a node with `--sqsAddress
:4566 --adminEnabled --adminAllowInsecureDevCookie`, create a queue,
send a few messages, hit `/admin/sqs/<name>` → counters match
`GetQueueAttributes("All")`, Delete dialog returns to list.

## Self-review (5 lenses)

1. **Data loss** — Delete reuses the existing `deleteQueueWithRetry` OCC
path; counters are read-only. No new write paths.
2. **Concurrency** — Per-request leader check on Delete; counters scan
uses one snapshot read TS.
3. **Performance** — Counters bounded by main's existing
`sqsApproxCounterScanLimit`; admin reads are cheap point lookups + one
bounded scan.
4. **Data consistency** — `AdminDescribeQueue` and SigV4
`GetQueueAttributes` both call `scanApproxCounters` at a fresh
`nextTxnReadTS`, so a single point in time produces the same counters
via either surface.
5. **Test coverage** — Existing admin / SQS race suites stay green via
the new `nil` Queues argument added to `startAdminServer` call sites;
the new bridge is exercised by the cross-package build itself.

## Stacking

This PR is **independent** — branched from current `main` (which has the
merged versions of #649 / #658 / #650 / counter implementation). Closing
#659 in favour of this clean rewrite.
bootjp added a commit that referenced this pull request Apr 28, 2026
…FO) (#664)

## Summary

**Docs-only PR.** Two design proposals for the remaining Phase 3 SQS
items per
[`docs/design/2026_04_24_proposed_sqs_compatible_adapter.md`](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_proposed_sqs_compatible_adapter.md)
Section 14 (Phase 3 bullets). Both items were explicitly called out as
needing separate design docs before any implementation work; this PR
lands those proposals so the implementation PRs have a reviewed
architecture to build on.

### 3.C — Per-queue throttling and tenant fairness
([proposal](docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md))

Per-queue token-bucket throttling configured on queue meta (no separate
keyspace), evaluated at the SQS adapter layer on the leader (no Raft per
request), surfaced as the AWS `Throttling` error envelope so SDK
retry/backoff just works.

Key decisions:
- **Default-off**. Existing queues are unaffected; operators opt in per
queue via `SetQueueAttributes`.
- **Per-action buckets** (Send / Receive / Default) so a slow consumer
cannot pin the producer.
- **Per-leader buckets, no replication**. Worst case on failover: one
extra burst on the new leader. Acceptable per AWS-equivalent behaviour
at region failover boundaries; replicating would cost a Raft commit per
`SendMessage`.
- **Batch verbs charge by entry count**, not call count, with
all-or-nothing rejection (matches AWS).
- **Admin-only configuration plane**. Standard SQS clients see
`InvalidAttributeName` on the `Throttle*` attributes (matches AWS
behaviour for unknown attributes); the data-plane enforcement runs for
everyone.

### 3.D — Split-queue FIFO (high-throughput FIFO)
([proposal](docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md))

Per-`MessageGroupId` hash partitioning across multiple Raft groups,
mirroring AWS High Throughput FIFO. Within-group ordering preserved;
across-group throughput scales with the partition count.

Key decisions:
- **Existing single-partition FIFO queues stay byte-identical**
(`PartitionCount = 0` path is the legacy layout; no migration runs
implicitly).
- **Power-of-two partition counts only** (1, 2, 4, 8, 16, 32) so the
routing step is `hash & (N-1)` and future offline rebuilds stay
tractable.
- **Partition count is immutable after first SendMessage**. Live
re-partitioning would break ordering for in-flight messages of every
group whose hash bucket changed; out of scope.
- **Multi-PR rollout plan** with an explicit "gate of no return" called
out at PR 5 (the data-plane PR). PRs 1–4 are reversible no-ops on data
layout; once a partitioned FIFO holds real data, rollback means draining
and recreating the queue.
- **FNV-1a hash** (deterministic across processes / Go versions /
architectures). Risk of attacker-controlled `MessageGroupId` pinning all
traffic to one partition is documented and accepted (the feature is for
cooperative operators).

## Test plan

- [x] Markdown renders correctly on GitHub (manually previewed).
- [x] Cross-references resolve (the partial-doc rename in PR #659 is in
flight; both proposals reference the partial filename — they will
resolve once #659 merges, otherwise resolve to the current `_proposed_`
filename for a 1-character mismatch that is fine to leave for now).
- [x] No code changes; CI is irrelevant beyond the markdown lint pass.

## Self-review

This is a docs-only PR; the 5-lens self-review collapses to:

1. **Data loss / Concurrency / Performance / Consistency** — N/A, no
code touched.
2. **Test coverage** — N/A, no code touched. The proposals themselves
include a Testing Strategy section (§6 / §9) so the implementation PRs
have explicit acceptance criteria.

## Stacking

This PR is **independent** of #650, #659, and #662. Branched from
current `main`. Merge whenever ready — landing the proposal docs early
lets reviewers comment on the architecture before the implementation PRs
go up.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Added detailed designs for optional per-queue token-bucket throttling
(rules, batch semantics, AWS-shaped Throttling errors with Retry-After,
metrics, rollout default-off) and for HT‑FIFO “split-queue” partitioning
(routing, immutability, gating/rollout).
* **New Features**
* Optional queue-level send/receive throttling with batch-aware charging
and Retry-After; HT‑FIFO partitioning with deterministic partition
routing and immutable partition attributes.
* **Tests**
* Extensive unit and integration tests for throttling, HT‑FIFO
partitioning, attribute validation, immutability, idempotency, and cache
invalidation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant