admin: AdminForward Phase 3 — register gRPC server + wire follower-side bridge#648
admin: AdminForward Phase 3 — register gRPC server + wire follower-side bridge#648
Conversation
Phase 3 of AdminForward (design 3.3) needs the production main.go path to plug a LeaderForwarder into the Dynamo handler so follower-side admin writes transparently forward to the leader. The handler-level WithLeaderForwarder option already shipped in PR #644; this commit exposes it via the high-level admin.NewServer composition surface so the wiring lives in main.go without reaching past the public API. A nil Forwarder leaves the existing behaviour intact (no-forwarder fallback path: ErrTablesNotLeader → 503 + Retry-After: 1), so single-node and leader-only deployments are unaffected. The integration test wires a stub forwarder through ServerDeps and confirms that a follower-side CreateTable replays the leader's 201 + payload verbatim — guarding against a future refactor that drops the forwarder before it reaches the Dynamo handler.
Phase 3 of AdminForward (design 3.3): the production path needs to construct an admin.LeaderForwarder that the dynamo HTTP handler can invoke when ErrTablesNotLeader bubbles up. The forwarder, in turn, needs a way to reach the leader over gRPC and a way to discover the leader's address. This commit lands the wiring infrastructure in main_admin_forward.go without touching main.go's runtime layout yet: - adminForwardConnFactory bridges kv.GRPCConnCache to admin.GRPCConnFactory. The cache hands back *grpc.ClientConn; the admin layer wants a typed PBAdminForwardClient. pb.NewAdminForwardClient(conn) is the generated adapter. - buildLeaderForwarder is the production constructor: resolver pulls coordinate.RaftLeader() (the default group's leader address), the factory above wraps the connection cache, and the local nodeID is stamped onto every forwarded request so the leader's audit log carries the trace (criterion 6). - roleStoreFromFlags builds the access-key → role index from the same flag values admin.Config.RoleIndex consumes. The gRPC ForwardServer registration in startRaftServers can construct it without waiting for the (later) startAdminFromFlags admin-config parse. - adminForwardServerDeps + readyForRegistration is the gate startRaftServers will use to decide whether to register the gRPC service at all. A nil TablesSource (cluster-only build) or a nil RoleStore (admin auth disabled) means the registered service would 500 every forwarded call, so silently skipping is the correct behaviour. Three sets of unit tests cover the constructors' input validation, the conn-factory's empty-address path, and the readyForRegistration truth table — all deterministic, no Raft or TCP required. The runtime wiring (runner.start reorder + pb.RegisterAdminForwardServer in startRaftServers + ServerDeps.Forwarder plumb-through) ships in follow-up commits in this same PR.
Phase 3 of AdminForward (design 3.3). Closes the wiring loop: the admin HTTP dynamo handler can now hand ErrTablesNotLeader writes off to the in-process LeaderForwarder, and the leader-side gRPC AdminForward service is registered on every node so the forwarded RPC actually lands somewhere. Changes: - runtimeServerRunner.start() now constructs the DynamoDBServer BEFORE startRaftServers. The previous order — raft listeners first, dynamoServer afterwards — was fine when no service needed the dynamoServer at registration time, but the leader-side gRPC ForwardServer needs it as its TablesSource. Both servers listen on different addresses; the dynamo HTTP listener accepting traffic before raft TCP listeners are up is no different from the existing startup-race semantics. - startRaftServers gains an adminForwardServerDeps parameter and delegates registration to registerAdminForwardServer in main_admin_forward.go. The bundle's readyForRegistration gate decides silently whether to register at all (cluster-only builds with no dynamoServer, or builds with admin auth disabled, skip it). - runtimeServerRunner gains a roleStore field, populated in startServers from the same flag values admin.Config.RoleIndex consumes inside startAdminFromFlags. The two paths build their own access-key→role index from the same input strings; cost is microseconds at startup. - startServers constructs a single *kv.GRPCConnCache and forwards it (along with the Coordinator) to startAdminFromFlags. The cache is shared with any future bridge that dials the leader's gRPC ports. - startAdminFromFlags accepts coordinate + connCache, builds a production LeaderForwarder via buildAdminLeaderForwarder, and passes it through ServerDeps.Forwarder. Single-node / leader-only builds (nil connCache or nil coordinate) intentionally fall back to the no-forwarder path — the dynamo handler still surfaces 503 + Retry-After:1 in those cases. This commit lights up acceptance criterion 2 (follower forwards transparently) end-to-end and tightens criterion 3's election-period behaviour: a follower with a configured forwarder hands the request to the leader instead of bouncing the SPA with a synthetic 503. Existing main_admin_test.go callers updated to pass nil for the new LeaderForwarder argument; the tests already exercise the no-forwarder path.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@claude review |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 0 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR adds leader-forwarding support to the admin API by introducing a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AdminHTTP as Admin HTTP Server
participant Forwarder
participant GRPCCache as gRPC Conn Cache
participant Leader
Client->>AdminHTTP: POST dynamo create-table request
AdminHTTP->>AdminHTTP: Check ServerDeps.Forwarder != nil
alt Forwarder configured
AdminHTTP->>Forwarder: Forward request (TableName, etc.)
Forwarder->>GRPCCache: Get or dial leader connection
GRPCCache->>Leader: Establish/reuse gRPC connection
Leader-->>GRPCCache: Return *grpc.ClientConn
GRPCCache-->>Forwarder: Return client connection
Forwarder->>Leader: Call AdminForward.CreateTable
Leader-->>Forwarder: Response
Forwarder-->>AdminHTTP: Return forwarded response
else No forwarder (fallback)
AdminHTTP->>AdminHTTP: Return 503 or retry locally
end
AdminHTTP-->>Client: HTTP response (201/503)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @bootjp's task in 3m 6s —— View job PR Review: AdminForward Phase 3
Overall the wiring is clean and the design rationale is well documented. Three issues worth addressing before merge, one of which is a genuine resource leak, and two minor quality items. Issue 1 —
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
main_admin.go (1)
128-146: Two-stage validation has a small inconsistency — minor, can defer.
buildAdminLeaderForwarderchooses between three outcomes for missing inputs:
coordinate == nilorconnCache == nil→(nil, nil)(silent fallback to 503 + Retry-After:1)nodeID == ""→ explicit error- otherwise → delegate to
buildLeaderForwarder, which itself validatescoordinate/connCacheagain and would also catchnodeID == ""viaadmin.NewGRPCForwardClientThe pre-check on
nodeIDis only useful whencoordinateandconnCacheare non-nil (otherwise the early-return wins and the emptynodeIDis silently tolerated). Production callers always pass non-nil collaborators so this is unreachable in practice, but it makes the error contract slightly fuzzy: "is emptynodeIDa hard failure or a fallback signal?" depends on the order of checks.If you want to make the contract crisper, validate
nodeIDfirst (matching upstreamresolveRuntimeInputs's "raftId is required" guarantee) so the fallback path is only ever taken because the wiring is intentionally incomplete. Not a blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main_admin.go` around lines 128 - 146, The function buildAdminLeaderForwarder currently early-returns (nil, nil) when coordinate or connCache are nil but only checks nodeID afterwards, creating an inconsistent contract; change the validation order so nodeID is validated first (mirroring resolveRuntimeInputs' "raftId is required" guarantee) and return an explicit error if nodeID == "" before checking coordinate/connCache, then keep the existing fallback-to-(nil,nil) behavior for missing coordinate or connCache and finally delegate to buildLeaderForwarder when all inputs are present.main_admin_forward.go (1)
95-120: Acknowledged divergence fromadmin.Config.RoleIndexis fine, but worth documenting in one place.The comment explains why
roleStoreFromFlagsruns without overlap-rejection (admin.Config.Validateowns that), and the test fixture covers the four quadrants. The divergence becomes interesting only ifadmin.Config.RoleIndexadds new behavior (e.g., normalization, deduplication) — keep this helper in lockstep, or wireadmin.Config.RoleIndex()into both paths once the config has been parsed (see the related comment inmain.go).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main_admin_forward.go` around lines 95 - 120, The comment notes a maintained divergence between roleStoreFromFlags and admin.Config.RoleIndex that could drift if RoleIndex adds behavior; update the code and docs so they stay in sync: either (A) add a clear single-source comment near roleStoreFromFlags referencing admin.Config.RoleIndex and the related main.go note (so future editors know to keep them in lockstep), or (B) refactor so the code path that has a parsed config uses admin.Config.RoleIndex directly (e.g., call admin.Config.RoleIndex from startAdminFromFlags / the place that has parsed config instead of duplicating logic in roleStoreFromFlags), and keep roleStoreFromFlags only as the minimal flag-only fallback with a comment stating it intentionally differs. Ensure references to roleStoreFromFlags, admin.Config.RoleIndex and startAdminFromFlags are present in the comment or refactor to prevent future divergence.main_admin_forward_test.go (1)
111-125: Optional: makeTestAdminForwardServerDeps_ReadyForRegistrationtable-driven.The other tests in this file are already table-driven; this one expresses four cases as separate
requirelines. Folding them into acases := []struct{...}table keeps the style consistent with the rest of the file.♻️ Suggested table-driven form
-func TestAdminForwardServerDeps_ReadyForRegistration(t *testing.T) { - require.False(t, adminForwardServerDeps{}.readyForRegistration()) - require.False(t, adminForwardServerDeps{tables: dummyTablesSource{}}.readyForRegistration()) - require.False(t, adminForwardServerDeps{roles: admin.MapRoleStore{}}.readyForRegistration()) - require.True(t, adminForwardServerDeps{ - tables: dummyTablesSource{}, - roles: admin.MapRoleStore{}, - }.readyForRegistration()) -} +func TestAdminForwardServerDeps_ReadyForRegistration(t *testing.T) { + cases := []struct { + name string + deps adminForwardServerDeps + want bool + }{ + {"both nil", adminForwardServerDeps{}, false}, + {"only tables", adminForwardServerDeps{tables: dummyTablesSource{}}, false}, + {"only roles", adminForwardServerDeps{roles: admin.MapRoleStore{}}, false}, + {"both set", adminForwardServerDeps{tables: dummyTablesSource{}, roles: admin.MapRoleStore{}}, true}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + require.Equal(t, tc.want, tc.deps.readyForRegistration()) + }) + } +}As per coding guidelines: "Co-locate test files in the same package and use table-driven test patterns."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main_admin_forward_test.go` around lines 111 - 125, Refactor TestAdminForwardServerDeps_ReadyForRegistration into a table-driven test: create a slice of cases each containing a name, an adminForwardServerDeps value (using dummyTablesSource{} and admin.MapRoleStore{} as needed) and the expected bool, then loop over cases and run t.Run(case.name, func(t *testing.T) { require.Equal(t, case.expected, case.deps.readyForRegistration()) }). Keep references to adminForwardServerDeps, readyForRegistration, dummyTablesSource, and admin.MapRoleStore so the test logic remains identical but follows the file's table-driven style.main.go (2)
952-998: Optional: tag the forward logger with a more specific component name.
forwardLogger := slog.Default().With(slog.String("component", "admin"))shares the same component label as the admin HTTP listener (buildAdminHTTPServerdoes the same). When operators tail logs to debug a follower→leader bridge issue, they can't distinguish the HTTP and gRPC sides by component alone.- forwardLogger := slog.Default().With(slog.String("component", "admin")) + forwardLogger := slog.Default().With(slog.String("component", "admin-forward"))As per coding guidelines: "Use structured
sloglogging with stable keys".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main.go` around lines 952 - 998, The forwardLogger in startRaftServers currently reuses the generic component label ("admin"), which makes HTTP vs gRPC forward logs indistinguishable; change the forwardLogger initialization (forwardLogger) to use a more specific structured key/value (for example component "admin_forward" or "admin_forward_grpc" and optionally include rt.spec.id/group id) so logs from registerAdminForwardServer are clearly separable from buildAdminHTTPServer logs; ensure the new logger is passed unchanged to registerAdminForwardServer and update any related tests or log expectations that rely on the old component key.
662-662: Role index is parsed twice from the same flags — risk of drift.
roleStoreFromFlags(parseCSV(*adminFullAccessKeys), parseCSV(*adminReadOnlyAccessKeys))builds the same access-key→role map thatadmin.Config.RoleIndex()will produce later insidebuildAdminHTTPServer. The comment inroleStoreFromFlagsacknowledges that overlap-rejection is only enforced on the HTTP-pathValidate()call. Practically:
- An overlapping key (in both
--adminFullAccessKeysand--adminReadOnlyAccessKeys) ends up asRoleReadOnlyin the gRPC ForwardServer's role store, whileadmin.Config.Validate()rejects the same configuration and aborts startup.- Between leader-side gRPC registration (in
runner.start()) and the later validation failure instartAdminFromFlags, the ForwardServer is briefly serving with the "wrong" mapping. The window is short and the process aborts, but a forwarded write that races into that window could be authorized inconsistently with what the operator declared.Consider building the role store once (e.g., parse
admin.Configfirst, then deriveroleStoreFromFlagsfromConfig.RoleIndex()) so both paths share the validated map.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main.go` at line 662, Currently parseCSV is used twice producing a role map in roleStoreFromFlags that can differ from admin.Config.RoleIndex(), causing a brief inconsistent authorization window; instead parse and validate the admin config first (the same flow used by startAdminFromFlags/admin.Config.Validate), then derive the role store from admin.Config.RoleIndex() and pass that to the gRPC ForwardServer/runner.start; update the call site that currently does roleStoreFromFlags(parseCSV(...), parseCSV(...)) to obtain the role index from the validated admin.Config and use that single source (remove the duplicate parseCSV usage and ensure overlap rejection from admin.Config.Validate() applies before runner.start).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main.go`:
- Around line 657-668: The AdminForward gRPC path is being prepared regardless
of the --adminEnabled flag because roleStore (via roleStoreFromFlags/parseCSV)
and connCache are created unconditionally and
adminForwardServerDeps.readyForRegistration() doesn't check adminEnabled; fix by
gating the creation/registration of the forward/server deps on the adminEnabled
flag: either stop building/attaching roleStore/connCache to forwardDeps before
startRaftServers (or wrap the call site that constructs forwardDeps) when
*adminEnabled is false, or add an explicit check for *adminEnabled inside
registerAdminForwardServer (or inside
adminForwardServerDeps.readyForRegistration()) so registerAdminForwardServer is
a no-op unless admin is enabled; reference roleStoreFromFlags, parseCSV,
connCache, startRaftServers and registerAdminForwardServer to locate the
changes.
- Line 668: The connCache (kv.GRPCConnCache) created inside startServers() is
never closed on shutdown; call connCache.Close() during process cleanup by
either returning the connCache from startServers() so run() can register it with
the cleanup stack, or register a shutdown goroutine inside startServers() that
runs on context cancellation (e.g., using the errgroup's eg.Go() or a ctx.Done()
watcher) and invokes connCache.Close(); update startServers() and run() to
propagate and register the cleanup accordingly so all grpc.ClientConn entries
are closed on shutdown.
---
Nitpick comments:
In `@main_admin_forward_test.go`:
- Around line 111-125: Refactor TestAdminForwardServerDeps_ReadyForRegistration
into a table-driven test: create a slice of cases each containing a name, an
adminForwardServerDeps value (using dummyTablesSource{} and admin.MapRoleStore{}
as needed) and the expected bool, then loop over cases and run t.Run(case.name,
func(t *testing.T) { require.Equal(t, case.expected,
case.deps.readyForRegistration()) }). Keep references to adminForwardServerDeps,
readyForRegistration, dummyTablesSource, and admin.MapRoleStore so the test
logic remains identical but follows the file's table-driven style.
In `@main_admin_forward.go`:
- Around line 95-120: The comment notes a maintained divergence between
roleStoreFromFlags and admin.Config.RoleIndex that could drift if RoleIndex adds
behavior; update the code and docs so they stay in sync: either (A) add a clear
single-source comment near roleStoreFromFlags referencing admin.Config.RoleIndex
and the related main.go note (so future editors know to keep them in lockstep),
or (B) refactor so the code path that has a parsed config uses
admin.Config.RoleIndex directly (e.g., call admin.Config.RoleIndex from
startAdminFromFlags / the place that has parsed config instead of duplicating
logic in roleStoreFromFlags), and keep roleStoreFromFlags only as the minimal
flag-only fallback with a comment stating it intentionally differs. Ensure
references to roleStoreFromFlags, admin.Config.RoleIndex and startAdminFromFlags
are present in the comment or refactor to prevent future divergence.
In `@main_admin.go`:
- Around line 128-146: The function buildAdminLeaderForwarder currently
early-returns (nil, nil) when coordinate or connCache are nil but only checks
nodeID afterwards, creating an inconsistent contract; change the validation
order so nodeID is validated first (mirroring resolveRuntimeInputs' "raftId is
required" guarantee) and return an explicit error if nodeID == "" before
checking coordinate/connCache, then keep the existing fallback-to-(nil,nil)
behavior for missing coordinate or connCache and finally delegate to
buildLeaderForwarder when all inputs are present.
In `@main.go`:
- Around line 952-998: The forwardLogger in startRaftServers currently reuses
the generic component label ("admin"), which makes HTTP vs gRPC forward logs
indistinguishable; change the forwardLogger initialization (forwardLogger) to
use a more specific structured key/value (for example component "admin_forward"
or "admin_forward_grpc" and optionally include rt.spec.id/group id) so logs from
registerAdminForwardServer are clearly separable from buildAdminHTTPServer logs;
ensure the new logger is passed unchanged to registerAdminForwardServer and
update any related tests or log expectations that rely on the old component key.
- Line 662: Currently parseCSV is used twice producing a role map in
roleStoreFromFlags that can differ from admin.Config.RoleIndex(), causing a
brief inconsistent authorization window; instead parse and validate the admin
config first (the same flow used by startAdminFromFlags/admin.Config.Validate),
then derive the role store from admin.Config.RoleIndex() and pass that to the
gRPC ForwardServer/runner.start; update the call site that currently does
roleStoreFromFlags(parseCSV(...), parseCSV(...)) to obtain the role index from
the validated admin.Config and use that single source (remove the duplicate
parseCSV usage and ensure overlap rejection from admin.Config.Validate() applies
before runner.start).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7cf20a65-0c07-49c8-8f5e-2a235ce2504c
📒 Files selected for processing (7)
internal/admin/server.gointernal/admin/server_test.gomain.gomain_admin.gomain_admin_forward.gomain_admin_forward_test.gomain_admin_test.go
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19f6b70a25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // runner.start() below, before startAdminFromFlags has a chance | ||
| // to parse the admin config) can re-use the same access-key map | ||
| // the HTTP path will eventually populate. | ||
| roleStore := roleStoreFromFlags(parseCSV(*adminFullAccessKeys), parseCSV(*adminReadOnlyAccessKeys)) |
There was a problem hiding this comment.
Gate AdminForward registration on adminEnabled
This wires roleStore from access-key flags unconditionally, so even when -adminEnabled=false a non-empty -adminFullAccessKeys/-adminReadOnlyAccessKeys still makes forwardDeps.readyForRegistration() true and startRaftServers registers AdminForward. In that mode the admin surface is expected to be off, but this commit re-exposes table write paths over gRPC (the forward server only re-checks access-key role mapping), so templated deployments that keep key flags set while toggling adminEnabled can still process admin mutations.
Useful? React with 👍 / 👎.
Four findings from Claude / Codex / CodeRabbit on PR #648: 1) **P1 / Major**: AdminForward gRPC registration was gated only on adminForwardServerDeps.readyForRegistration() (tables + roles both non-nil), neither of which referenced *adminEnabled. A node started with -adminEnabled=false but with admin access keys still configured would have registered the leader-side gRPC AdminForward service and accepted forwarded admin writes — re-exposing the table-mutation surface a follower-direct admin call could reach. Fix: build roleStore + connCache only when *adminEnabled. With admin disabled, forwardDeps.roles stays nil and registerAdminForwardServer's readyForRegistration gate skips the gRPC bind entirely. 2) **Resource leak (Claude Issue 1, CodeRabbit Minor)**: connCache was created in startServers() but never closed. Every address dialled accumulates an open HTTP/2 connection that is never released. Fix: shutdown goroutine that drains the cache via Close() on in.ctx.Done(), wired through the existing errgroup. 3) **Lint hygiene (Claude Issue 2)**: //nolint:nilnil suppresses a linter that is not enabled in .golangci.yaml AND violates CLAUDE.md's "Avoid //nolint — refactor instead" rule. The directive was pure dead weight. Fix: drop the directive; the doc comment on the function and the inline comment already explain the nil-means-disabled contract. 4) **Coverage gap (Claude Issue 3)**: buildAdminLeaderForwarder's nil-coordinate / nil-cache short-circuit was untested. A future refactor that drops the guard would silently pass a nil collaborator into buildLeaderForwarder, which would either crash on the nil deref or build a forwarder that panics on the first request. Fix: TestBuildAdminLeaderForwarder_NilGateReturnsNoForwarder covers all three nil combinations + the empty-nodeID error path, plus a happy-path companion.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review 964ecd4 addresses all four findings from the previous review pass:
|
|
Claude finished @bootjp's task in 1m 54s —— View job PR Review: AdminForward Phase 3 — follow-up on 964ecd4
All four issues from the previous review pass are correctly addressed. Details below. Finding 1 —
|
P2 slice 1 of [docs/design/2026_04_24_proposed_admin_dashboard.md](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_proposed_admin_dashboard.md). Read-only S3 admin endpoints land first so the SPA's S3List and S3Detail pages stop hitting 404. Write paths (POST/PUT/DELETE/ACL) ship in slice 2 together with AdminForward integration; until then they reply 405. ## Summary - **`*adapter.S3Server.AdminListBuckets` / `AdminDescribeBucket`** — SigV4-bypass read methods. Share `loadBucketMetaAt` + the metadata-prefix scan with the SigV4 path, so a SigV4 `listBuckets` and the admin dashboard cannot drift. - **`internal/admin/buckets_source.go`** — `BucketsSource` interface + `BucketSummary` DTO + sentinel errors (`ErrBucketsForbidden` / `ErrBucketsNotLeader` / `ErrBucketsNotFound` / `ErrBucketsAlreadyExists`). `ErrBucketsForbidden` is wired in this slice; the others are reserved for slice 2 to keep the error vocabulary additive. - **`internal/admin/s3_handler.go`** — `S3Handler` with `handleList` (paginated) + `handleDescribe` (404 for missing). Sub-paths under `/buckets/{name}/` (the future `/acl`) return 404 here so a SPA bug pointed at this build cannot accidentally hit the describe path with a `"{name}/acl"` string. - **Shared `list_pagination.go`** — centralises base64url cursor + limit parsing/clamping so the Dynamo and S3 handlers cannot diverge on validation rules. Drops the now-redundant `parseDynamoListLimit` / `decodeDynamoNextToken` / `encodeDynamoNextToken` from `dynamo_handler.go` in favour of `parseListPaginationParams` / `decodeListNextToken` / `encodeListNextToken`. - **`apiRouteTable` in `server.go`** — bundles the precomposed middleware chains and dispatches by URL prefix. `buildAPIMux`'s body went from 13 cyclomatic branches to 6, leaving headroom for the next resource family (SQS queues, etc.) to land without another refactor. - **Production wiring** — `main_s3.go`'s `startS3Server` now returns `*adapter.S3Server`; `runtimeServerRunner` stores it; `startAdminFromFlags` accepts it and threads `bucketsBridge` → `admin.BucketsSource` → `ServerDeps.Buckets`. ## What is NOT in this PR - `AdminCreateBucket` / `AdminPutBucketAcl` / `AdminDeleteBucket` on the adapter and their HTTP/AdminForward counterparts. Slice 2. - `RoleStore` plumb-through for S3. Slice 2 (read-only is fine for any authenticated session today; the write paths will need it). - `forwarded_from` audit trail for S3 admin writes. Slice 2 — needs the `pb.AdminOperation` enum extended. - The merge-freeze-acceptable changes to `proto/admin_forward.proto`. Slice 2. ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (admin + main + adapter packages: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` — 13 new handler tests pass - [x] `go test . -count=1 -race` — main package passes - [x] `go test -run "TestS3Server_Admin" ./adapter/ -count=1 -race` — 4 new adapter tests pass - [ ] `go test ./adapter/` times out at 120s due to a pre-existing flake (verified earlier on PR #648 against `main` — unrelated to this branch) - [ ] End-to-end smoke against a 3-node cluster — needs the next slice + the SPA PR (#649) to merge before there's anything for the SPA to render ## Acceptance criteria coverage (Section 4.1) | Endpoint | This PR | |---|---| | `GET /admin/api/v1/s3/buckets` | ✓ | | `POST /admin/api/v1/s3/buckets` | ⏳ slice 2 | | `GET /admin/api/v1/s3/buckets/{name}` | ✓ | | `PUT /admin/api/v1/s3/buckets/{name}/acl` | ⏳ slice 2 | | `DELETE /admin/api/v1/s3/buckets/{name}` | ⏳ slice 2 | ## Self-review (5 lenses) 1. **Data loss**: No FSM / Raft / Pebble path changes. `AdminListBuckets` / `AdminDescribeBucket` are read-only and use the same `ScanAt` / `loadBucketMetaAt` as the SigV4 path. 2. **Concurrency**: No new shared state. The handler holds only the immutable `BucketsSource` interface; the bridge holds only the immutable `*S3Server`. `pinReadTS` is the same pattern the SigV4 path uses for snapshot stability. 3. **Performance**: One additional metadata scan per `GET /admin/api/v1/s3/buckets`, identical to the SigV4 listBuckets. Pagination caps at 1000 buckets per page (silent clamp on oversize). No hot-path changes. 4. **Data consistency**: Read-only; no commit-ts ordering risk. Admin reads use the same lease-read window as the SigV4 path. 5. **Test coverage**: 13 handler tests + 4 adapter tests + the existing dynamo/cluster test surfaces still pass after the shared-helper refactor. The pagination cursor round-trip test pins the wire format so a future cursor-encoding change cannot ship without breaking the test. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added S3 bucket admin API endpoints to list all buckets and retrieve detailed bucket information, including creation timestamps. * Implemented pagination support with cursor-based next_token for browsing large bucket collections. * Added proper authorization checks and error handling for admin bucket operations (403 for forbidden, 404 for not found, 500 for errors). <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
P2 slice 2b: a follower-side S3 admin write (POST /buckets, PUT
/buckets/{name}/acl, DELETE /buckets/{name}) now hands off to the leader
transparently, completing the same end-to-end forwarding contract Dynamo
writes received in #644 + #648.
**Stacked on #669 (P2 slice 2a).** Once #669 merges, this rebases
cleanly onto main.
## Summary
- **Proto** — three new `ADMIN_OP` enum values (`CREATE_BUCKET` /
`DELETE_BUCKET` / `PUT_BUCKET_ACL`) appended after the Dynamo block so
existing wire integers stay stable. Regenerated with the pinned protoc
29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
- **Leader-side** — `ForwardServer.WithBucketsSource` lets deployments
wire the S3 dispatcher optionally; Dynamo-only builds keep it nil and
the new operations return 501. Three new dispatch arms mirror the
leader-direct HTTP path's payload contract (NUL-byte rejection, 64 KiB
limit, `DisallowUnknownFields`, trailing-token rejection, slash-in-name
rejection). `forwardBucketsErrorResponse` mirrors
`forwardErrorResponse`: `ErrBucketsForbidden` / `NotLeader` / `NotFound`
/ `AlreadyExists` / `NotEmpty` + `*ValidationError` each map to the same
HTTP status `writeBucketsError` produces, so forwarded and leader-direct
responses are byte-identical from the SPA's view.
- **Follower-side** — `LeaderForwarder` interface gains
`ForwardCreateBucket` / `ForwardDeleteBucket` / `ForwardPutBucketAcl`.
`ForwardPutBucketAcl` carries both the bucket name (URL path) and the
new ACL (request body) in one JSON payload — same approach
`ForwardDeleteTable` takes for the table name.
- **Handler integration** — `S3Handler.WithLeaderForwarder` +
`tryForwardCreateBucket` / `tryForwardPutBucketAcl` /
`tryForwardDeleteBucket` gated on `ErrBucketsNotLeader && forwarder !=
nil`. `writeForwardResult` / `writeForwardFailure` mirror the Dynamo
handler's pattern (nosniff + Cache-Control:no-store + Retry-After:1 on
503).
- **Wiring** — `runtimeServerRunner.start()` now creates
`*adapter.S3Server` before `startRaftServers` so the leader-side
`ForwardServer` registration sees both adapters.
`adminForwardServerDeps.buckets` is wired through
`registerAdminForwardServer`. `ServerDeps.Forwarder` now plumbs into
`S3Handler` too.
## What is NOT in this PR
- Rolling-upgrade compatibility flag (criterion 5) — still deferred
behind a cluster-version bump.
## Test plan
- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin + main + adapter + proto: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race` — full admin suite
passes (19 new tests across forward_server / forward_client / s3_handler
integration)
- [x] `go test . -count=1 -race` — main package passes
- [x] Targeted adapter tests pass
- [ ] Full `./adapter/` times out due to a pre-existing flake (verified
earlier on PRs #648 / #658 / #669 — unrelated to this branch)
- [ ] End-to-end smoke against a 3-node cluster — manual exercise after
#669 + this land
## Acceptance criteria coverage (Section 3.3.2 + 4.1)
| # | Criterion | Status |
|---|---|---|
| 1 | Leader direct write | ✓ for S3 (#669 / slice 2a) |
| 2 | Follower forwards transparently | ✅ for S3 (this PR) |
| 3 | Election-period 503 + retry | ✅ for S3 (this PR — same fallback as
Dynamo) |
| 4 | Leader demotes stale full role | ✅ for S3 (existing
`validatePrincipal` runs before bucket dispatch) |
| 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) |
| 6 | `forwarded_from` in audit log | ✅ for S3 (this PR — leader-side
audit log carries it) |
## Self-review (5 lenses)
1. **Data loss**: No FSM / Raft / Pebble path changes. The leader-side
dispatcher reuses the same `BucketsSource` interface that the
leader-direct HTTP path uses; the only new code path is the JSON
envelope + principal re-validation.
2. **Concurrency**: New methods on `gRPCForwardClient` reuse the
existing `forward()` helper, so connection-cache reuse and
`ErrLeaderUnavailable` signalling behave identically across resource
types. The `runtimeServerRunner.start()` reorder is safe — each adapter
listens on its own address and the raft TCP listeners are independent.
3. **Performance**: One additional gRPC round-trip on follower-side
admin writes; identical to Dynamo's path.
4. **Data consistency**: Leader-side dispatcher applies the same strict
body decode the leader-direct HTTP path does (NUL-byte rejection,
trailing-token rejection, slash-in-name rejection).
`forwardBucketsErrorResponse` produces byte-identical envelopes to
`writeBucketsError` so forwarded and leader-direct responses are
indistinguishable.
5. **Test coverage**: 19 new tests — 9 forward-server (happy paths, 501
no-source, bad-JSON, already-exists, not-empty, slash-in-name,
missing-acl, payload-too-large sweep), 4 forward-client (3 happy-path
payload shapes + ErrLeaderUnavailable for all three), 6
handler-integration (forwarded create/delete/put-acl happy +
leader-unavailable + transport-error + 3-axis non-NotLeader gate sweep).
P2 slice 2a of [docs/design/2026_04_24_proposed_admin_dashboard.md](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_proposed_admin_dashboard.md). Ships the S3 admin write endpoints so the SPA's S3 modals stop receiving 405. Slice 2b will plumb AdminForward so a follower can hand these writes off to the leader transparently. ## Summary - **`*adapter.S3Server.AdminCreateBucket` / `AdminPutBucketAcl` / `AdminDeleteBucket`** — SigV4-bypass write methods with three in-method guards: principal must be `AdminRoleFull`, the local node must be the verified S3 leader, and bucket-name / ACL must pass the existing validators. `AdminCreateBucket` reuses the same atomic bucket-meta + ACL + generation-key txn the SigV4 path does — no new code path through the storage layer. `AdminDeleteBucket` rejects non-empty buckets (the dashboard cannot force recursive delete by design). - **`internal/admin` write surface** — `BucketsSource` gains the three write methods + `CreateBucketRequest` / `PutBucketACLRequest` types with the documented JSON shapes. `S3Handler.serveCollection` + `servePerBucket` route POST/PUT/DELETE through dedicated handlers with `principalForWrite` re-validating the role on every request against the live `MapRoleStore`. - **Strict body decoder** — `decodeAdminS3JSONBody` is generic over the request type, applies `DisallowUnknownFields`, rejects NUL bytes, rejects trailing tokens, and caps at 64 KiB (matches design 4.4). Used by both POST and PUT. - **`writeBucketsError`** translates the source-side sentinels into the design's HTTP statuses: 403 forbidden / 503 + Retry-After:1 leader_unavailable / 404 not_found / 409 already_exists / 409 bucket_not_empty / 400 invalid_request via `*ValidationError`. - **Bridge** — `bucketsBridge` gains write methods running through `translateAdminBucketsError`, mirroring `translateAdminTablesError` on the Dynamo side. Leader-churn sentinels from the kv coordinator route to `admin.ErrBucketsNotLeader` so the SPA's retry contract stays intact. ## What is NOT in this PR - AdminForward integration for S3 admin writes — slice 2b. - Rolling-upgrade compatibility flag (criterion 5) — still deferred behind a cluster-version bump. ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (admin + main + adapter packages: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` — 19 new handler tests pass - [x] `go test . -count=1 -race` — main package passes - [x] `go test -run "TestS3Server_Admin" ./adapter/ -count=1 -race` — 10 new adapter tests pass - [ ] Full `go test ./adapter/` times out at 120s due to a pre-existing flake (verified earlier on PRs #648 / #658 against `main` — unrelated to this branch) - [ ] End-to-end smoke against a 3-node cluster — slice 2b first, then a manual exercise ## Acceptance criteria coverage (Section 4.1) | Endpoint | This PR | |---|---| | `GET /admin/api/v1/s3/buckets` | ✓ (#658) | | `GET /admin/api/v1/s3/buckets/{name}` | ✓ (#658) | | `POST /admin/api/v1/s3/buckets` | ✅ | | `PUT /admin/api/v1/s3/buckets/{name}/acl` | ✅ | | `DELETE /admin/api/v1/s3/buckets/{name}` | ✅ | ## Self-review (5 lenses) 1. **Data loss**: `AdminCreateBucket` reuses `s.coordinator.Dispatch` with the same `OperationGroup` shape as the SigV4 path — bucket meta + generation key in one txn. No new FSM / Pebble / Raft path. `AdminDeleteBucket`'s "must be empty" guard is a SnapshotAt scan + size check identical to the SigV4 path. 2. **Concurrency**: Writes go through `retryS3Mutation` which already handles transient mid-dispatch leader churn. The leader check is `isVerifiedS3Leader` — same primitive the SigV4 path uses. Role gate is re-evaluated against the live `MapRoleStore` on every request, so a key downgrade picked up between login and write is enforced immediately. 3. **Performance**: One additional load-bucket-meta read on PutACL / Delete. No hot-path changes; admin writes are operator-rate, not data-plane-rate. 4. **Data consistency**: `AdminCreateBucket` writes (BucketMetaKey, BucketGenerationKey) atomically. `AdminPutBucketAcl` mutates only `meta.Acl` and re-encodes the entire BucketMeta — generation is preserved so existing object references stay valid. `AdminDeleteBucket` removes only BucketMetaKey (BucketGenerationKey is left behind, matching the SigV4 path's behaviour — a future re-create gets a fresh generation). 5. **Test coverage**: 29 new tests (19 admin-package + 10 adapter-level) covering happy paths, role gates, leader checks, validation rejections, all four sentinel error mappings, and the cross-method missing-principal 401. The existing `TestS3Handler_DescribeBucket_SubpathReturns404` was superseded by two more precise tests now that `/acl` is a real sub-resource. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * S3 bucket admin endpoints: create, update ACL, and delete with enforced write authorization, input validation, leader-forwarding, and clear HTTP error mappings. * **Documentation** * New admin docs covering dashboard config, TLS/role semantics, audit logging, and troubleshooting for admin operations. * **Tests** * Extensive end-to-end and unit tests for bucket lifecycle, forwarding, auth/validation, and error scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Per docs/design/README.md's lifecycle convention. The original P1–P4 plan has fully shipped: - P1 (admin skeleton + Dynamo + AdminForward) — #634/#635/#644/#648 - P2 (S3 endpoints incl. write paths and AdminForward integration) — #658 / #669 / #673 / #695 (TOCTOU safety net) - P3 (React SPA + embed) — #649 / #650 - P4 (TLS / role / CSRF / operator doc / deployment runbook / scripts/rolling-update.sh admin support) — #674 / #669 / #678 The AdminDeleteBucket TOCTOU caught during PR #669 review (the last "in-flight" item that kept the doc at _partial_) is fully resolved by the safety-net design landed in #695. What changed: - git mv 2026_04_24_partial_admin_dashboard.md → 2026_04_24_implemented_admin_dashboard.md (history follows the rename) - Header Status line: "Partial" → "Implemented", explanation updated to reflect the post-fix state and the rationale for promotion. - "Last updated" bumped to 2026-04-28 with the rename trigger. - Section heading "Outstanding open items" → "Out-of-scope follow-ups" — the remaining three entries (criterion 5, object browser, TLS hot-reload) are not in-flight work; they are deferred-at-design or Non-goal items. The TOCTOU bullet is removed (resolved) and replaced with a one-line cross-link to the safety-net design + admin_deployment.md §4.6 contract. - Removed the closing "rename trigger" sentence — we just did the rename. - Status table: P2 row now lists #695 alongside #658/#669/#673 so a future reader can find the TOCTOU fix from the index. - Cross-references updated everywhere the old filename appeared: docs/admin.md (header link + Cross-references) docs/admin_deployment.md (header link + final cross-ref) docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md (Background section pointer) internal/admin/config.go (Section 7.1 reference comment) No code changes other than the comment-only filename refresh in config.go.
## Summary Promote the admin dashboard design doc from `_partial_` → `_implemented_` per `docs/design/README.md`'s lifecycle convention. PR #695 landed the TOCTOU safety-net fix (the last in-flight item that kept the doc at `_partial_`), so the original P1–P4 plan is now fully shipped: | Phase | Landed via | |---|---| | P1 (admin skeleton + Dynamo + AdminForward) | #634 / #635 / #644 / #648 | | P2 (S3 endpoints + writes + AdminForward S3 + TOCTOU fix) | #658 / #669 / #673 / **#695** | | P3 (React SPA + embed) | #649 / #650 | | P4 (TLS / role / CSRF / operator doc / runbook / script wiring) | #674 / #669 / #678 | The remaining three items in the doc move from "Outstanding open items" (in-flight) to **"Out-of-scope follow-ups"** (deferred-at-design or Non-goal): - AdminForward criterion 5 — rolling-upgrade flag, deferred behind a cluster-version bump that doesn't exist yet - S3 object browser — Non-goal per §2.2 - TLS cert hot-reload — out of scope per `docs/admin.md` ## Changes - `git mv` partial → implemented (history follows the rename) - Status line / Last-updated / status-table / Out-of-scope section content reflects the promotion - Cross-references updated in all 4 referencing files: `docs/admin.md`, `docs/admin_deployment.md`, `docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md`, `internal/admin/config.go` (comment-only) ## Test plan - [x] No code changes other than a comment-only filename refresh in `config.go` - [x] `go build ./...` passes - [x] `golangci-lint run ./internal/admin/...` — 0 issues - [x] `grep -rn "2026_04_24_partial\|2026_04_24_proposed_admin"` returns nothing — no stale references
Phase 3 of Task #26 / AdminForward design 3.3. Closes the wiring loop opened by #635 (leader-side dispatcher) and #644 (follower-side client + handler integration): the admin HTTP DynamoHandler now hands
ErrTablesNotLeaderwrites off to the in-processLeaderForwarder, and the leader-side gRPCAdminForwardservice is registered on every node so the forwarded RPC actually lands somewhere.Summary
admin.ServerDeps.Forwarder(3315bf4) — new optional field on the high-level admin server composition surface.admin.NewServercallsWithLeaderForwarderwhen set; nil leaves the existing 503 + Retry-After:1 fallback untouched. Single-node / leader-only deployments are unaffected.main_admin_forward.go(f3e529c) — production bridge betweenkv.GRPCConnCacheand the admin layer's typedPBAdminForwardClient(viapb.NewAdminForwardClient). IncludesbuildLeaderForwarder,roleStoreFromFlags,adminForwardServerDeps, and areadyForRegistrationgate.runtimeServerRunner.start()now createsdynamoServerBEFOREstartRaftServers, so the leader-sideForwardServercan use it as itsTablesSource.startRaftServersaccepts the newadminForwardServerDepsbundle and delegates toregisterAdminForwardServer.startServersconstructs a single*kv.GRPCConnCacheand forwards it (with theCoordinator) tostartAdminFromFlags, which in turn builds the productionLeaderForwarderviabuildAdminLeaderForwarderand passes it throughServerDeps.Forwarder.What is NOT in this PR
Test plan
go build ./...go vet ./...golangci-lint run(main + admin packages: 0 issues)go test ./internal/admin/ -count=1 -race— full admin suite passes (existing 21 forward tests + 1 newTestServer_ServerDepsForwarderIsWired)go test . -count=1 -race— main package passes (4 new bridge / role-store / readyForRegistration tests)adapterpackage times out at 240s on this branch AND onmain— pre-existing flake unrelated to this PR (verified by checking out main and running the same suite)Acceptance criteria coverage
forwarded_fromin audit log--raftIdonto every forwarded request)Self-review (5 lenses)
AdminCreateTable/AdminDeleteTableRaft proposal once they reach the leader.runner.start()reorder confirmed safe —startDynamoDBServeronly depends oncoordinate+shardStore(already constructed); raft TCP listeners are independent of the dynamo HTTP listener. Worst case during the 100 ms-or-so reorder window is the same "no leader yet" 503 the old order would have produced if a request landed before raft converged.&kv.GRPCConnCache{}allocation per process; one closure for the resolver. No hot-path changes.AdminCreateTablepath does its ownHLC.Next.TestServer_ServerDepsForwarderIsWiredis the regression test for "future refactor drops the forwarder before it reaches the dynamo handler" — exactly the failure mode the wiring change in this PR introduces the risk of.Summary by CodeRabbit
New Features
Tests