admin: AdminForward follower-side client + handler integration (P1, partial)#644
admin: AdminForward follower-side client + handler integration (P1, partial)#644
Conversation
Adds the LeaderForwarder interface and a gRPC-backed implementation that the dynamo HTTP handler will invoke when the local node is a follower (the source returned ErrTablesNotLeader). This commit only introduces the client + its tests; wiring it into the handler and the main_admin.go bridge is the next phase. Pieces: - LeaderForwarder interface with ForwardCreateTable / ForwardDeleteTable, decoupled from proto so the handler stays proto-free. - ForwardResult is the transport-neutral envelope the handler re-emits verbatim, so a forwarded request looks identical to a leader-direct call from the SPA's perspective. - ErrLeaderUnavailable for the "no leader known" path (acceptance criterion 3 — the handler will turn this into 503 + Retry-After: 1 once wired up). - gRPCForwardClient builds the AdminForwardRequest from the admin CreateTableRequest / table name + AuthPrincipal, dials via a GRPCConnFactory abstraction, and re-shapes the response. - forwarded_from is populated with the local node id so the leader's audit log line carries the trace (criterion 6, leader side already in #635). - Defensive: status_code == 0 (proto zero value) is upgraded to 502 Bad Gateway; missing ContentType is filled with the JSON default. Both surface transport bugs rather than producing silently-malformed SPA responses. Tests cover: - constructor input validation (3 cases) - happy paths for both Forward operations including principal, payload, op enum, and forwarded_from round-trip - ErrLeaderUnavailable when the resolver returns "" - dial / rpc errors propagated with cockroachdb/errors wrapping - zero status code upgrade - missing content type fallback Not in this commit: handler integration, gRPC server registration in main.go, election-period retry. Those land in follow-ups before the PR opens.
Wire the follower-side forwarder from phase 1 into the dynamo HTTP handler so writes hitting a follower transparently dispatch to the leader. Acceptance criteria 2 + 3 from design Section 3.3.2. Changes: - DynamoHandler gains a forwarder field plus WithLeaderForwarder option. nil keeps the previous "503 leader_unavailable" behaviour, so a build that ships only the leader can still deploy unchanged. - handleCreate / handleDelete catch ErrTablesNotLeader from the source and route to the forwarder. The principal already in context flows through to the leader's audit log unchanged. - writeForwardResult re-emits the leader's structured response verbatim — status, payload, and content-type all come from the gRPC response so a forwarded request looks identical to a leader-direct call from the SPA's perspective. - writeForwardFailure handles the two non-success paths: ErrLeaderUnavailable (election in flight, criterion 3) and generic gRPC transport errors. Both produce 503 + Retry-After: 1 so the SPA's retry policy is uniform regardless of where in the chain the 503 came from. ErrLeaderUnavailable is NOT logged as an error since elections are routine; transport errors are logged at LevelError so operators can investigate. - A 503 surfacing via the leader's structured response (e.g., leader stepped down mid-request) also gets Retry-After: 1 added on the way out so the wire contract is the same. Tests cover criteria 2 (transparent forwarding for both create and delete), 3 (election-period 503 + Retry-After), the leader-direct error pass-through (409 conflict), the no-forwarder fallback, and confirm the role / body validations short-circuit *before* the forwarder is invoked. Stub LeaderForwarder + a not-leader source make the tests deterministic without spinning up a real Raft cluster.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 45 minutes and 48 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 28s —— View job Code Review:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4982a8c22
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Forward(ctx context.Context, in *pb.AdminForwardRequest, opts ...grpcCallOption) (*pb.AdminForwardResponse, error) | ||
| } | ||
|
|
||
| // grpcCallOption is a re-export of google.golang.org/grpc.CallOption | ||
| // kept private so callers do not import proto's grpc dep directly. |
There was a problem hiding this comment.
Match PBAdminForwardClient opts type to grpc.CallOption
PBAdminForwardClient uses ...grpcCallOption, but grpcCallOption is currently aliased to interface{} rather than grpc.CallOption. In this shape, generated pb.AdminForwardClient does not satisfy the interface (different method signature), so wiring the real gRPC client in the production bridge will fail to compile when it tries to return pb.NewAdminForwardClient(...) as a PBAdminForwardClient.
Useful? React with 👍 / 👎.
Four findings on the follower-side forward client + handler
integration:
- High: PBAdminForwardClient declared opts as `...interface{}`
via `type grpcCallOption = interface{}`, which is incompatible
with the proto-generated *adminForwardClient's
`...grpc.CallOption`. Go interface satisfaction requires exact
variadic element match, so the production bridge would have
failed to compile when assigning pb.NewAdminForwardClient(conn)
to PBAdminForwardClient. Switch to grpc.CallOption (already a
transitive dependency, no new import surface).
- Medium: writeForwardResult re-emitted Content-Type and
Cache-Control but dropped X-Content-Type-Options: nosniff that
writeAdminJSONStatus sets on the leader-direct path. Added
parity. New test
TestDynamoHandler_ForwarderForwardsLeaderResponseSetsNosniff
pins the header.
- Low: missing test for "forwarder configured AND source returns
non-ErrTablesNotLeader". Added
TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError —
table-driven sweep over already-exists, validation, and
generic error, each confirming the forwarder is never invoked
and the appropriate non-503 status is returned.
- Nit: WithLeaderForwarder's nil semantics differ from
WithLogger's. Added a doc comment explaining the intentional
asymmetry — nil forwarder is a meaningful "disable" state, nil
logger preserves the slog.Default() seed.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review |
|
Claude finished @bootjp's task in 2m 52s —— View job Code Review:
|
| Previous finding | Status |
|---|---|
grpcCallOption = interface{} blocks production bridge |
✅ Fixed: grpc.CallOption alias + correct import (forward_client.go:11,83) |
writeForwardResult drops X-Content-Type-Options: nosniff |
✅ Fixed: added at dynamo_handler.go:482 |
No test for non-ErrTablesNotLeader gate |
✅ Fixed: TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError covers ErrTablesAlreadyExists, ValidationError, opaque errors |
Asymmetry in WithLeaderForwarder nil-guard undocumented |
✅ Fixed: comment at dynamo_handler.go:215-222 |
What's working well
PBAdminForwardClientnow matches the proto-generatedadminForwardClientsignature exactly —pb.NewAdminForwardClient(conn)will satisfy the interface and the next PR's bridge will compile.writeForwardFailuredelegates towriteJSONError, which already setsnosniff+Cache-Control: no-store(router.go:283-285) — the error path has full security parity with the success path.ErrLeaderUnavailableis returned unwrapped inforward()and checked with stdliberrors.IsinwriteForwardFailure. Thepkgerrors.Wrapcalls only wrap the dial/RPC errors, so the sentinel check is always exact and correct.- The
writeForwardResultRetry-After injection on 503 (line 488-490) ensures the criterion-3 contract holds uniformly regardless of whether the 503 originated at the follower or was relayed from the leader. TestDynamoHandler_ForwarderForwardsLeaderResponseSetsNosniffdirectly pins the security parity.TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderErroris table-driven with three error classes — locking down the gate against accidental removal.
The only blocker for the next PR is the DELETE coverage gap above. The production bridge wiring can proceed once that's locked down.
Claude's follow-up review on PR #644 noted that TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError only exercised the create side, leaving tryForwardDelete's matching guard untested. The DELETE forward error paths (ErrLeaderUnavailable, generic transport error) also had no direct coverage. Close the gap so a typo-class change to tryForwardDelete or writeForwardFailure on the delete side fails a test. Three additions: - Extend TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError to a two-axis sweep: 3 create cases (already_exists, validation, generic) × 3 delete cases (not_found, forbidden, generic) — 6 sub-tests pinning that the forward path runs ONLY on ErrTablesNotLeader. - TestDynamoHandler_ForwarderLeaderUnavailableReturns503_Delete: 503 + Retry-After: 1 + leader_unavailable body when the forwarder reports election in flight. - TestDynamoHandler_ForwarderTransportErrorReturns503_Delete: 503 + Retry-After: 1 + no internal detail leaking to the SPA when the forwarder gRPC transport fails. All three reuse the existing stubLeaderForwarder + notLeaderSource fixtures — no new test scaffolding needed.
|
@claude review |
|
Claude finished @bootjp's task in 3m 51s —— View job Code Review:
|
| Previous finding | Status |
|---|---|
TestDynamoHandler_ForwarderLeaderUnavailableReturns503_Delete missing |
✅ Added at forward_integration_test.go:239 |
TestDynamoHandler_ForwarderTransportErrorReturns503_Delete missing |
✅ Added at forward_integration_test.go:262 |
TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError only covered CREATE gate |
✅ Now covers DELETE in same table-driven sweep (forward_integration_test.go:202–231) |
Medium: Stale comment in writeTablesError — ErrTablesNotLeader branch
internal/admin/dynamo_handler.go:530-536
case errors.Is(err, ErrTablesNotLeader):
// The follower→leader forwarding RPC (design 3.3) will
// catch this case in a follow-up PR. Until then, surface
// 503 + Retry-After: 1 so the SPA / curl can re-issue.This comment was written before tryForwardCreate / tryForwardDelete existed. It's now misleading: the forwarder already catches this case in this PR when h.forwarder != nil. The ErrTablesNotLeader arm of writeTablesError is now only reachable when h.forwarder == nil (no-forwarder fallback path, tested by TestDynamoHandler_NoForwarder_FallsBackTo503). A future reader tracking the forwarding design will be confused by "will catch this case in a follow-up PR" when it's already caught here.
Suggested update:
case errors.Is(err, ErrTablesNotLeader):
// Reached only when no LeaderForwarder is configured (single-
// node or leader-only deployments). When a forwarder is wired,
// tryForwardCreate / tryForwardDelete intercept this error
// before writeTablesError is called.
w.Header().Set("Retry-After", "1")
writeJSONError(w, http.StatusServiceUnavailable, "leader_unavailable",
"this admin node is not the raft leader")Nit: TestDynamoHandler_ForwarderTransportErrorReturns503 (create-side) is missing Retry-After assertion
forward_integration_test.go:147-161
The create-side transport-error test does not assert Retry-After: 1, but its DELETE mirror (TestDynamoHandler_ForwarderTransportErrorReturns503_Delete, line 275) does. The header is set by writeForwardFailure on both paths — this is an assertion gap, not a behavior bug. Consistent assertion coverage prevents a future change that accidentally drops Retry-After from only writeForwardFailure (not writeJSONError) from going undetected on the create path.
require.Equal(t, http.StatusServiceUnavailable, rec.Code)
require.Equal(t, "1", rec.Header().Get("Retry-After")) // ← add this
require.NotContains(t, rec.Body.String(), "TX-1")Nit: Unnecessary tables map in notLeaderSource-backed DELETE forwarder error tests
forward_integration_test.go:241-243, 265-267
src := ¬LeaderSource{stubTablesSource: stubTablesSource{
tables: map[string]*DynamoTableSummary{"users": {Name: "users"}},
}}notLeaderSource.AdminDeleteTable overrides stubTablesSource.AdminDeleteTable and unconditionally returns ErrTablesNotLeader without ever consulting tables. The map initialisation is a no-op for these tests. Using ¬LeaderSource{} (or the newFollowerHandler helper, which already constructs a correctly-typed notLeaderSource) would make the intent clearer.
What's working well
- All four previous review issues are fully addressed.
grpcCallOption = grpc.CallOption,nosniffparity,TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderErrortable-driven gate, and theWithLeaderForwardernil asymmetry comment. - The DELETE path now has full error-path parity with CREATE: happy path,
ErrLeaderUnavailable→ 503 + Retry-After, transport error → 503 + no leakage, non-ErrTablesNotLeadergate lock-down. tryForwardDeleteis structurally identical totryForwardCreate— an intentional duplication that keeps both paths independently testable without shared helper indirection.writeForwardResultcorrectly guards the 204 body withlen(res.Payload) > 0so DELETE no-content responses don't write a spurious body.ErrLeaderUnavailableis returned unwrapped fromforward();errors.IsinwriteForwardFailurematches it exactly with no fragility through wrapping layers.- 13 integration tests + 8 forward-client unit tests (21 total), all deterministic, no TCP or Raft required.
The two nits and the stale comment are the only remaining findings. None block the next PR's bridge wiring.
Three follow-ups from Claude review on a8de663: 1) writeTablesError ErrTablesNotLeader comment said "follow-up PR" but the forwarder is in this PR. The arm is now reached only when no LeaderForwarder is wired (single-node / leader-only deployments); when one is configured, tryForwardCreate / tryForwardDelete intercept the error before writeTablesError is called. Refreshed the comment to reflect that. 2) TestDynamoHandler_ForwarderTransportErrorReturns503 (CREATE-side) was missing the Retry-After:1 assertion that its DELETE counterpart already had. Added it for parity, guards against a future change that drops Retry-After from writeForwardFailure on only one path. 3) TestDynamoHandler_ForwarderLeaderUnavailableReturns503_Delete and TestDynamoHandler_ForwarderTransportErrorReturns503_Delete passed a tables map into notLeaderSource, but notLeaderSource.AdminDeleteTable unconditionally returns ErrTablesNotLeader without consulting it. Dropped the no-op map init.
|
@claude review 87043f9 addresses the three findings from your previous review:
|
|
Claude finished @bootjp's task in 2m 13s —— View job Code Review:
|
| Previous finding | Status |
|---|---|
writeTablesError / ErrTablesNotLeader arm had "follow-up PR" comment |
✅ Updated to "Reached only when no LeaderForwarder is configured…" (dynamo_handler.go:531-534) |
TestDynamoHandler_ForwarderTransportErrorReturns503 missing Retry-After: 1 assertion |
✅ Added at forward_integration_test.go:159 |
Two DELETE forwarder error tests initialised a no-op tables map on notLeaderSource |
✅ Cleaned up: both tests now use bare ¬LeaderSource{} (forward_integration_test.go:245,266) |
Nit: ErrTablesNotLeader sentinel comment is now stale
internal/admin/dynamo_handler.go:121-123
// ErrTablesNotLeader is returned when the local node is not the
// Raft leader. Maps to 503 + Retry-After: 1 today; the future
// AdminForward RPC catches this as the trigger to forward.
ErrTablesNotLeader = errors.New("admin tables: local node is not the raft leader")The writeTablesError comment was updated (✅), but the sentinel's own declaration comment wasn't. "The future AdminForward RPC catches this as the trigger to forward" describes a state that now exists — the trigger is tryForwardCreate / tryForwardDelete in this PR.
Suggested update:
// ErrTablesNotLeader is returned when the local node is not the
// Raft leader. When a LeaderForwarder is configured, tryForwardCreate
// and tryForwardDelete catch this before it reaches writeTablesError
// and forward the request to the leader transparently. Without a
// forwarder, maps to 503 + Retry-After: 1.Nit: newFollowerHandler still carries an empty tables map
internal/admin/forward_integration_test.go:66
src := ¬LeaderSource{stubTablesSource: stubTablesSource{tables: map[string]*DynamoTableSummary{}}}The two individual tests that had the same pattern were cleaned up to ¬LeaderSource{}. This helper still carries the empty map, which notLeaderSource's overridden methods never consult. The inconsistency isn't a bug, but a reader comparing the helper to the individual tests could wonder if the map has meaning here. This is purely cosmetic — ¬LeaderSource{} would be consistent with the surrounding tests.
What's working well
- The
writeTablesErrorstale comment is now precise: the arm is only reached when no forwarder is wired, and the comment says exactly that. - The
Retry-After: 1assertion inTestDynamoHandler_ForwarderTransportErrorReturns503now matches its DELETE counterpart — both paths lock down the criterion-3 contract uniformly. - Bare
¬LeaderSource{}in the DELETE error tests correctly signals that the maps are unused, making the test intent easier to read. - All 21 tests (8 unit + 13 integration) remain deterministic and free of TCP/Raft dependencies.
The sentinel comment nit is the only remaining finding. It doesn't block the next PR's bridge wiring.
Two nit follow-ups from Claude review #4 on 87043f9: 1) The ErrTablesNotLeader sentinel's own declaration comment still said "the future AdminForward RPC catches this as the trigger to forward". The previous review pass updated only the writeTablesError site comment. Refreshed the sentinel comment too: when a LeaderForwarder is configured, tryForwardCreate / tryForwardDelete catch this before writeTablesError; without a forwarder, the arm maps to 503 + Retry-After:1. 2) newFollowerHandler still constructed notLeaderSource with an empty tables map. notLeaderSource overrides AdminCreateTable / AdminDeleteTable to unconditionally return ErrTablesNotLeader, so the map is never consulted. The two individual tests cleaned up in the previous commit already use bare ¬LeaderSource{}; the helper now matches.
|
@claude review f901d70 addresses both nits from Review #4:
|
|
Claude finished @bootjp's task in 2m 27s —— View job Code Review:
|
| Previous finding | Status |
|---|---|
ErrTablesNotLeader sentinel declaration comment said "future AdminForward RPC catches this" |
✅ Updated to "When a LeaderForwarder is configured, tryForwardCreate / tryForwardDelete catch this…" (dynamo_handler.go:121–125) |
newFollowerHandler still used ¬LeaderSource{stubTablesSource: stubTablesSource{tables: …}} |
✅ Now bare ¬LeaderSource{} with a comment explaining why the map is unneeded (forward_integration_test.go:69) |
Nit: Delete-path gate test has tables map; create-path does not
internal/admin/forward_integration_test.go:194 vs 220-222
The create-path cases in TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError use:
src := &stubTablesSource{createErr: tc.err}But the delete-path cases use:
src := &stubTablesSource{
tables: map[string]*DynamoTableSummary{"users": {Name: "users"}},
deleteErr: tc.err,
}stubTablesSource.AdminDeleteTable returns early when deleteErr != nil (before the tables lookup), so the map is unreachable in all three delete cases (ErrTablesNotFound, ErrTablesForbidden, errors.New("opaque")). Using &stubTablesSource{deleteErr: tc.err} would be consistent with the create-path style. Not a blocker — just an asymmetry that mildly suggests the map matters when it doesn't.
Summary
Five reviews, no open issues. The PR is ready to merge from the admin-package perspective:
grpcCallOption = grpc.CallOption— production bridge will compileX-Content-Type-Options: nosniffparity on forwarded path- DELETE error-path coverage matches CREATE (three cases each, plus
ErrLeaderUnavailableand transport-error paths) - All four stale comments (sentinel,
writeTablesErrorarm) updated to reflect that forwarding ships in this PR newFollowerHandlerand the two DELETE forwarder-error tests use bare¬LeaderSource{}- 21 deterministic tests; no TCP or Raft required
One nit follow-up from Claude review #5 on f901d70: The delete-path cases in TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError constructed stubTablesSource with both a tables map and deleteErr. stubTablesSource.AdminDeleteTable returns early on deleteErr != nil before consulting the map, so the map was unreachable in all three delete cases. Now matches the create-path style (deleteErr only), removing a mild "the map matters" misread.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Claude finished @bootjp's task in 1m 57s —— View job Code Review:
|
| Previous finding | Status |
|---|---|
Delete-path gate test initialized an unreachable tables map on stubTablesSource while the create-path didn't |
✅ Dropped: src := &stubTablesSource{deleteErr: tc.err} now matches the create-path style (forward_integration_test.go:224) |
The added comment (forward_integration_test.go:220–223) explicitly explains why the map was removed, which is exactly what was flagged — preventing a future reader from wondering if the omission was an oversight.
Complete findings tracker across all 6 reviews
| # | Finding | Resolution |
|---|---|---|
| R1-High | grpcCallOption = interface{} blocks production bridge compile |
✅ grpc.CallOption alias + import |
| R1-Medium | writeForwardResult drops X-Content-Type-Options: nosniff |
✅ Added |
| R1-Low | No test for "forwarder wired + non-ErrTablesNotLeader source error" |
✅ TestDynamoHandler_ForwarderNotInvokedForNonNotLeaderError |
| R1-Nit | WithLeaderForwarder nil-guard asymmetry undocumented |
✅ Comment added |
| R2-Medium | DELETE path error cases uncovered (ErrLeaderUnavailable, transport, non-NotLeader gate) |
✅ Three new delete-path tests |
| R3-Medium | Stale "follow-up PR" comment in writeTablesError ErrTablesNotLeader arm |
✅ Updated |
| R3-Nit | Create-side transport-error test missing Retry-After: 1 assertion |
✅ Added |
| R3-Nit | Two DELETE forwarder-error tests had no-op tables map on notLeaderSource |
✅ Dropped |
| R4-Nit | ErrTablesNotLeader sentinel declaration comment still said "future AdminForward RPC" |
✅ Updated |
| R4-Nit | newFollowerHandler still carried empty tables map |
✅ Dropped |
| R5-Nit | Delete-path gate test stubTablesSource had unreachable tables map |
✅ Dropped (e7f017e) |
Summary
Six reviews, zero open issues. The PR is ready to merge:
grpcCallOption = grpc.CallOption—pb.NewAdminForwardClient(conn)satisfiesPBAdminForwardClient; the next PR's bridge will compile.- Full
nosniff/Retry-After/Cache-Controlparity between the forwarded and leader-direct paths. - DELETE error-path coverage is symmetric with CREATE across all meaningful error classes.
- All stale "follow-up PR" language updated to reflect that forwarding ships in this PR.
newFollowerHandler, the twonotLeaderSourceDELETE tests, and now the gate tests are all consistent: no phantom map initializations.- 21 deterministic tests (8 unit + 13 integration); no TCP or Raft required.
…de bridge (#648) Phase 3 of Task #26 / AdminForward design 3.3. Closes the wiring loop opened by #635 (leader-side dispatcher) and #644 (follower-side client + handler integration): the admin HTTP DynamoHandler now hands `ErrTablesNotLeader` writes off to the in-process `LeaderForwarder`, and the leader-side gRPC `AdminForward` service is registered on every node so the forwarded RPC actually lands somewhere. ## Summary - **`admin.ServerDeps.Forwarder`** (3315bf4) — new optional field on the high-level admin server composition surface. `admin.NewServer` calls `WithLeaderForwarder` when set; nil leaves the existing 503 + Retry-After:1 fallback untouched. Single-node / leader-only deployments are unaffected. - **`main_admin_forward.go`** (f3e529c) — production bridge between `kv.GRPCConnCache` and the admin layer's typed `PBAdminForwardClient` (via `pb.NewAdminForwardClient`). Includes `buildLeaderForwarder`, `roleStoreFromFlags`, `adminForwardServerDeps`, and a `readyForRegistration` gate. - **Runtime wiring** (19f6b70) — `runtimeServerRunner.start()` now creates `dynamoServer` BEFORE `startRaftServers`, so the leader-side `ForwardServer` can use it as its `TablesSource`. `startRaftServers` accepts the new `adminForwardServerDeps` bundle and delegates to `registerAdminForwardServer`. `startServers` constructs a single `*kv.GRPCConnCache` and forwards it (with the `Coordinator`) to `startAdminFromFlags`, which in turn builds the production `LeaderForwarder` via `buildAdminLeaderForwarder` and passes it through `ServerDeps.Forwarder`. ## What is NOT in this PR - Rolling-upgrade compatibility flag (criterion 5) — still deferred behind a cluster-version bump. - Election-period retry loop on the follower's bridge — the handler returns 503 + Retry-After:1, the SPA / curl re-issues, and the bridge dials again on the next attempt. Inline retry inside the handler would just hide the latency from operators. ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (main + admin packages: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` — full admin suite passes (existing 21 forward tests + 1 new `TestServer_ServerDepsForwarderIsWired`) - [x] `go test . -count=1 -race` — main package passes (4 new bridge / role-store / readyForRegistration tests) - [ ] `adapter` package times out at 240s on this branch AND on `main` — pre-existing flake unrelated to this PR (verified by checking out main and running the same suite) - [ ] End-to-end smoke against a 3-node cluster — needs the local Jepsen runner; blocked on the next PR or a manual run ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ (in main since #634) | | 2 | Follower forwards transparently | ✓ — wiring complete; both leader-side gRPC service AND follower-side bridge are now plumbed | | 3 | Election-period 503 + retry | ✓ — handler still returns 503 + Retry-After:1 when no leader is known; the SPA / client retries and the next attempt re-dials | | 4 | Leader demotes stale full role | ✓ (in main since #635) | | 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) | | 6 | `forwarded_from` in audit log | ✓ (in main since #635 — bridge now stamps `--raftId` onto every forwarded request) | ## Self-review (5 lenses) 1. **Data loss**: No FSM / Raft / Pebble path changes; admin writes still go through the same `AdminCreateTable` / `AdminDeleteTable` Raft proposal once they reach the leader. 2. **Concurrency**: `runner.start()` reorder confirmed safe — `startDynamoDBServer` only depends on `coordinate` + `shardStore` (already constructed); raft TCP listeners are independent of the dynamo HTTP listener. Worst case during the 100 ms-or-so reorder window is the same "no leader yet" 503 the old order would have produced if a request landed before raft converged. 3. **Performance**: One additional `&kv.GRPCConnCache{}` allocation per process; one closure for the resolver. No hot-path changes. 4. **Data consistency**: Forwarded requests re-validate the principal at the leader (criterion 4, already shipped); commit-ts ordering is unchanged because the leader's `AdminCreateTable` path does its own `HLC.Next`. 5. **Test coverage**: 4 new main-package tests + 1 new admin-package server-level test. `TestServer_ServerDepsForwarderIsWired` is the regression test for "future refactor drops the forwarder before it reaches the dynamo handler" — exactly the failure mode the wiring change in this PR introduces the risk of. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added leader-forwarding capability for admin requests sent to follower nodes * Implemented role-based access control for admin API operations * Enhanced DynamoDB request routing to automatically direct operations to leader nodes * **Tests** * Added comprehensive integration and unit tests validating leader-forwarding paths <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Phase **P3** of [docs/design/2026_04_24_proposed_admin_dashboard.md](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_proposed_admin_dashboard.md). The Go side (auth / JWT / cluster handler / dynamo handler / follower→leader forwarding / `embed.go` skeleton) shipped in earlier PRs (#644 et al.) but `StaticFS` was wired as `nil`, so `/admin/*` answered 404 for any non-API path. This PR ships the React bundle and flips `StaticFS` on. - React 18 + TypeScript + Vite, Tailwind CSS for tokens (Section 5.1). No UI framework dependency — bundle stays at **~60 KB gzip** vs the doc's 150 KB ceiling (Section 7.2). - `vite.config.ts` writes its build output straight into `internal/admin/dist/` so `npm run build && go build` produces a single-binary release. - Five pages per Section 5.2: Login, Dashboard, DynamoDB list/detail (with Create + Delete dialogs), S3 list/detail (placeholder — endpoints not yet wired, the SPA renders a soft "endpoint pending" notice on the 404). - Cookie-based session (Section 6): `admin_session` (HttpOnly) handled by the browser; `admin_csrf` value echoed in `X-Admin-CSRF` for every POST/PUT/DELETE per the double-submit pattern. Read-only sessions hide write affordances; backend re-evaluates the role. - `.gitignore` keeps the placeholder `internal/admin/dist/index.html` committed (so `//go:embed all:dist` always has a match in a fresh clone) and excludes the hashed Vite asset outputs and npm/vite caches — committed bundles drift from source. ## Test plan - [x] `cd web/admin && npm install && npm run build` — 192 KB JS / 60 KB gzip + 14 KB CSS - [x] `cd web/admin && npm run lint` — `tsc --strict --noUnusedLocals --noUncheckedSideEffectImports` - [x] `go build ./...` — clean (placeholder + built bundle both compile) - [x] `go test -race -count=1 ./internal/admin/...` - [x] `golangci-lint run ./internal/admin/...` - [ ] Manual smoke: start admin listener with `--adminEnabled --adminAllowInsecureDevCookie --adminListen 127.0.0.1:8080`, visit `http://127.0.0.1:8080/admin/`, log in with a SigV4 access key, confirm dashboard / dynamo list / dynamo detail / dynamo create / dynamo delete + see graceful "endpoint pending" on the S3 pages. ## Notes - **Self-review (5 lenses)** — these changes only ship UI + 1 line of Go wiring (`StaticFS` activation). No replication, MVCC, OCC, or HLC code paths touched. Lenses 1 (data loss) / 2 (concurrency) / 4 (consistency) → no impact. Lens 3 (perf) → +1 cold read of `embed.FS` per startup, ~1 MB binary growth (under the doc's 1–2 MB target). Lens 5 (test coverage) → SPA-side Vitest is documented as a P4 follow-up in the design doc; this PR does not regress any existing Go test. - The S3 admin endpoints, follower→leader forwarding for write paths, and the design-doc lifecycle rename (`proposed_` → `partial_`) are out of scope here and are tracked elsewhere.
Five issues raised by Claude on PR #664 (one medium, four low — all docs-only edits to the partial doc trio): 3.D split-queue FIFO doc: 1. (Medium) §8.1 stale failure-mode description contradicted §4.2. Original §8.1 said "Consumer never sees a partition's leader → mitigation: round-robin endpoints" — but §4.2 was rewritten earlier in this PR to proxy non-local partitions server-side, eliminating the false-empty risk entirely. Implementors reading §8.1 in isolation would have built the wrong mental model and possibly suppressed the proxy logic. §8.1 now describes the actual remaining failure mode — proxy round-trip latency under spread deployments — and explains that single-partition / co-located deployments pay nothing. 2. (Low) §3.1 variadic sqsMsgDataKey was a footgun. The variadic form would silently accept extra args without compiler diagnosis. Replaced with two named constructors (legacyMsgDataKey / partitionedMsgDataKey) and an explicit dispatch at the call site based on meta.PartitionCount > 1. 3. (Low) §3.1 p| discriminator safety constraint was implicit. Added an explicit paragraph explaining why the prefix cannot collide with the legacy keyspace: queue names admit only [A-Za-z0-9_-] (+ optional .fifo suffix), the validator must continue to reject |, and base32 segments never start with the literal ASCII p. Future relaxations of the name validator have to revisit this prefix scheme. 4. (Low) §8.5 upgrade-gate mechanism was underspecified. Added the concrete protocol: each node's /sqs_health endpoint advertises capabilities: ["htfifo"] once this PR's code is in the binary; the catalog reads the live node set from the distribution layer's node registry, polls /sqs_health on each, and gates PartitionCount > 1 on every node reporting htfifo. Fail-closed on timeouts. References PR #644 as the prior art for the same pattern. 3.C per-queue throttling doc: 5. (Low) §3.4 Retry-After: 1 lied for slow-refill queues. A queue with SendRefillPerSecond = 0.1 needs 10 s for the next token; returning 1 burned 9 unnecessary client retries before the request would succeed. Replaced with a per-queue computation: ceil((1 - currentTokens) / refillRate), floored at 1 s to match Retry-After's integer-second granularity. Fast-refill queues (>= 1 RPS) preserve the existing Retry-After: 1 behaviour. Docs-only PR; no code changes, no implementation gate.
Slice 2b of P2 (docs/design/2026_04_24_proposed_admin_dashboard.md
Section 3.3.2 + 4.1): a follower-side S3 admin write
(POST /buckets, PUT /buckets/{name}/acl, DELETE /buckets/{name})
now hands off to the leader transparently, completing the same
end-to-end forwarding contract Dynamo writes received in #644 +
Stacked on #669 (P2 slice 2a). Once #669 merges, this rebases
cleanly onto main.
Proto (proto/admin_forward.proto):
- Three new ADMIN_OP enum values (CREATE_BUCKET / DELETE_BUCKET /
PUT_BUCKET_ACL) appended after the Dynamo block so existing
wire-format integers stay stable. Regenerated with the pinned
protoc 29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
Leader-side ForwardServer (internal/admin/forward_server.go):
- WithBucketsSource lets deployments wire the S3 dispatcher
optionally — Dynamo-only builds keep the receiver nil and the
three new operations return 501 NotImplemented.
- Three new dispatch arms: handleCreateBucket / handleDeleteBucket /
handlePutBucketAcl. Each one mirrors the leader-direct HTTP
path's payload contract (NUL-byte rejection, 64 KiB limit,
DisallowUnknownFields, trailing-token rejection, slash-in-name
rejection) so a hostile follower cannot smuggle a payload past
validations the leader-direct path enforces.
- forwardBucketsErrorResponse mirrors forwardErrorResponse on the
Dynamo side: ErrBucketsForbidden / NotLeader / NotFound /
AlreadyExists / NotEmpty + ValidationError each map to the same
HTTP status the leader-direct writeBucketsError produces, so
forwarded and leader-direct responses are byte-for-byte
indistinguishable from the SPA's view.
- isStructuredSourceError extended to recognise the bucket
sentinels so they are NOT logged at LevelError on the leader.
- Forward's switch was extracted into dispatchForward to keep the
parent function under cyclop's 10-branch ceiling as the operation
enum grew.
Follower-side LeaderForwarder (internal/admin/forward_client.go):
- Interface gains ForwardCreateBucket / ForwardDeleteBucket /
ForwardPutBucketAcl. PutBucketAcl carries both the bucket name
(from the URL path) and the new ACL (from the request body) in
one JSON payload — same approach handleDeleteBucket takes for
the bucket name.
- gRPCForwardClient methods reuse the existing forward() helper
for transport, so connection-cache reuse and ErrLeaderUnavailable
signalling behave identically across resource types.
Handler integration (internal/admin/s3_handler.go):
- New `forwarder LeaderForwarder` field + WithLeaderForwarder method.
- handleCreate / handlePutACL / handleDelete now consult tryForward*
helpers when the source returned ErrBucketsNotLeader; the helpers
are gated on `errors.Is(err, ErrBucketsNotLeader) && forwarder != nil`
so a leader-direct rejection (already-exists, not-found, etc.) is
never re-applied at the leader.
- writeForwardResult / writeForwardFailure mirror the Dynamo handler's
pattern: nosniff + Cache-Control:no-store + Retry-After:1 on 503.
ErrLeaderUnavailable does NOT log at LevelError (elections are
routine); transport errors do log so operators can investigate.
Wiring (main.go + main_admin_forward.go):
- adminForwardServerDeps gains a `buckets` field; readyForRegistration
still requires only TablesSource + RoleStore (so cluster-only or
Dynamo-only builds keep registering Dynamo forwarding without S3).
- runtimeServerRunner.start() now creates *adapter.S3Server BEFORE
startRaftServers (in addition to dynamoServer) so the leader-side
ForwardServer registration sees both adapters. The reorder is safe:
each adapter listens on its own address and the raft TCP listeners
are independent.
- ServerDeps.Forwarder now plumbs through buildS3HandlerForDeps too,
so the follower's S3Handler picks up the same LeaderForwarder
instance the Dynamo handler does.
Tests:
- 9 forward-server tests covering the three new bucket operations:
happy path / no-BucketsSource→501 / bad-JSON 400 / already-exists
409 / not-empty 409 / slash-in-name 400 / missing-acl 400 /
payload-too-large 413 (sweep over all three ops).
- 4 forward-client tests covering ForwardCreateBucket /
ForwardDeleteBucket / ForwardPutBucketAcl happy-path payload
shapes + ErrLeaderUnavailable on no-leader.
- stubLeaderForwarder gains 3 bucket-side forward methods so
existing dynamo tests still satisfy the LeaderForwarder
interface, and the new stub fields let bucket-handler tests
verify the forward arguments.
- 6 handler integration tests on S3Handler.tryForward*: forwarded
create / delete / put-acl happy paths (replay leader's status
+ payload + content-type), forwarder ErrLeaderUnavailable → 503
+ Retry-After, transport-error → 503 + no leakage, and a 3-axis
gate sweep proving the forwarder is NOT invoked on
AlreadyExists / Forbidden / generic source errors.
Closes design 3.3.2 acceptance criteria 2 (transparent forwarding)
+ 6 (forwarded_from in audit log) for S3 admin writes; criterion 3
(election-period 503 + retry) is also live for S3 because the
existing tryForward helpers reuse the same fallback paths.
Slice 2b of P2 (docs/design/2026_04_24_proposed_admin_dashboard.md
Section 3.3.2 + 4.1): a follower-side S3 admin write
(POST /buckets, PUT /buckets/{name}/acl, DELETE /buckets/{name})
now hands off to the leader transparently, completing the same
end-to-end forwarding contract Dynamo writes received in #644 +
Stacked on #669 (P2 slice 2a). Once #669 merges, this rebases
cleanly onto main.
Proto (proto/admin_forward.proto):
- Three new ADMIN_OP enum values (CREATE_BUCKET / DELETE_BUCKET /
PUT_BUCKET_ACL) appended after the Dynamo block so existing
wire-format integers stay stable. Regenerated with the pinned
protoc 29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
Leader-side ForwardServer (internal/admin/forward_server.go):
- WithBucketsSource lets deployments wire the S3 dispatcher
optionally — Dynamo-only builds keep the receiver nil and the
three new operations return 501 NotImplemented.
- Three new dispatch arms: handleCreateBucket / handleDeleteBucket /
handlePutBucketAcl. Each one mirrors the leader-direct HTTP
path's payload contract (NUL-byte rejection, 64 KiB limit,
DisallowUnknownFields, trailing-token rejection, slash-in-name
rejection) so a hostile follower cannot smuggle a payload past
validations the leader-direct path enforces.
- forwardBucketsErrorResponse mirrors forwardErrorResponse on the
Dynamo side: ErrBucketsForbidden / NotLeader / NotFound /
AlreadyExists / NotEmpty + ValidationError each map to the same
HTTP status the leader-direct writeBucketsError produces, so
forwarded and leader-direct responses are byte-for-byte
indistinguishable from the SPA's view.
- isStructuredSourceError extended to recognise the bucket
sentinels so they are NOT logged at LevelError on the leader.
- Forward's switch was extracted into dispatchForward to keep the
parent function under cyclop's 10-branch ceiling as the operation
enum grew.
Follower-side LeaderForwarder (internal/admin/forward_client.go):
- Interface gains ForwardCreateBucket / ForwardDeleteBucket /
ForwardPutBucketAcl. PutBucketAcl carries both the bucket name
(from the URL path) and the new ACL (from the request body) in
one JSON payload — same approach handleDeleteBucket takes for
the bucket name.
- gRPCForwardClient methods reuse the existing forward() helper
for transport, so connection-cache reuse and ErrLeaderUnavailable
signalling behave identically across resource types.
Handler integration (internal/admin/s3_handler.go):
- New `forwarder LeaderForwarder` field + WithLeaderForwarder method.
- handleCreate / handlePutACL / handleDelete now consult tryForward*
helpers when the source returned ErrBucketsNotLeader; the helpers
are gated on `errors.Is(err, ErrBucketsNotLeader) && forwarder != nil`
so a leader-direct rejection (already-exists, not-found, etc.) is
never re-applied at the leader.
- writeForwardResult / writeForwardFailure mirror the Dynamo handler's
pattern: nosniff + Cache-Control:no-store + Retry-After:1 on 503.
ErrLeaderUnavailable does NOT log at LevelError (elections are
routine); transport errors do log so operators can investigate.
Wiring (main.go + main_admin_forward.go):
- adminForwardServerDeps gains a `buckets` field; readyForRegistration
still requires only TablesSource + RoleStore (so cluster-only or
Dynamo-only builds keep registering Dynamo forwarding without S3).
- runtimeServerRunner.start() now creates *adapter.S3Server BEFORE
startRaftServers (in addition to dynamoServer) so the leader-side
ForwardServer registration sees both adapters. The reorder is safe:
each adapter listens on its own address and the raft TCP listeners
are independent.
- ServerDeps.Forwarder now plumbs through buildS3HandlerForDeps too,
so the follower's S3Handler picks up the same LeaderForwarder
instance the Dynamo handler does.
Tests:
- 9 forward-server tests covering the three new bucket operations:
happy path / no-BucketsSource→501 / bad-JSON 400 / already-exists
409 / not-empty 409 / slash-in-name 400 / missing-acl 400 /
payload-too-large 413 (sweep over all three ops).
- 4 forward-client tests covering ForwardCreateBucket /
ForwardDeleteBucket / ForwardPutBucketAcl happy-path payload
shapes + ErrLeaderUnavailable on no-leader.
- stubLeaderForwarder gains 3 bucket-side forward methods so
existing dynamo tests still satisfy the LeaderForwarder
interface, and the new stub fields let bucket-handler tests
verify the forward arguments.
- 6 handler integration tests on S3Handler.tryForward*: forwarded
create / delete / put-acl happy paths (replay leader's status
+ payload + content-type), forwarder ErrLeaderUnavailable → 503
+ Retry-After, transport-error → 503 + no leakage, and a 3-axis
gate sweep proving the forwarder is NOT invoked on
AlreadyExists / Forbidden / generic source errors.
Closes design 3.3.2 acceptance criteria 2 (transparent forwarding)
+ 6 (forwarded_from in audit log) for S3 admin writes; criterion 3
(election-period 503 + retry) is also live for S3 because the
existing tryForward helpers reuse the same fallback paths.
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
P2 slice 2b: a follower-side S3 admin write (POST /buckets, PUT
/buckets/{name}/acl, DELETE /buckets/{name}) now hands off to the leader
transparently, completing the same end-to-end forwarding contract Dynamo
writes received in #644 + #648.
**Stacked on #669 (P2 slice 2a).** Once #669 merges, this rebases
cleanly onto main.
## Summary
- **Proto** — three new `ADMIN_OP` enum values (`CREATE_BUCKET` /
`DELETE_BUCKET` / `PUT_BUCKET_ACL`) appended after the Dynamo block so
existing wire integers stay stable. Regenerated with the pinned protoc
29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
- **Leader-side** — `ForwardServer.WithBucketsSource` lets deployments
wire the S3 dispatcher optionally; Dynamo-only builds keep it nil and
the new operations return 501. Three new dispatch arms mirror the
leader-direct HTTP path's payload contract (NUL-byte rejection, 64 KiB
limit, `DisallowUnknownFields`, trailing-token rejection, slash-in-name
rejection). `forwardBucketsErrorResponse` mirrors
`forwardErrorResponse`: `ErrBucketsForbidden` / `NotLeader` / `NotFound`
/ `AlreadyExists` / `NotEmpty` + `*ValidationError` each map to the same
HTTP status `writeBucketsError` produces, so forwarded and leader-direct
responses are byte-identical from the SPA's view.
- **Follower-side** — `LeaderForwarder` interface gains
`ForwardCreateBucket` / `ForwardDeleteBucket` / `ForwardPutBucketAcl`.
`ForwardPutBucketAcl` carries both the bucket name (URL path) and the
new ACL (request body) in one JSON payload — same approach
`ForwardDeleteTable` takes for the table name.
- **Handler integration** — `S3Handler.WithLeaderForwarder` +
`tryForwardCreateBucket` / `tryForwardPutBucketAcl` /
`tryForwardDeleteBucket` gated on `ErrBucketsNotLeader && forwarder !=
nil`. `writeForwardResult` / `writeForwardFailure` mirror the Dynamo
handler's pattern (nosniff + Cache-Control:no-store + Retry-After:1 on
503).
- **Wiring** — `runtimeServerRunner.start()` now creates
`*adapter.S3Server` before `startRaftServers` so the leader-side
`ForwardServer` registration sees both adapters.
`adminForwardServerDeps.buckets` is wired through
`registerAdminForwardServer`. `ServerDeps.Forwarder` now plumbs into
`S3Handler` too.
## What is NOT in this PR
- Rolling-upgrade compatibility flag (criterion 5) — still deferred
behind a cluster-version bump.
## Test plan
- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin + main + adapter + proto: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race` — full admin suite
passes (19 new tests across forward_server / forward_client / s3_handler
integration)
- [x] `go test . -count=1 -race` — main package passes
- [x] Targeted adapter tests pass
- [ ] Full `./adapter/` times out due to a pre-existing flake (verified
earlier on PRs #648 / #658 / #669 — unrelated to this branch)
- [ ] End-to-end smoke against a 3-node cluster — manual exercise after
#669 + this land
## Acceptance criteria coverage (Section 3.3.2 + 4.1)
| # | Criterion | Status |
|---|---|---|
| 1 | Leader direct write | ✓ for S3 (#669 / slice 2a) |
| 2 | Follower forwards transparently | ✅ for S3 (this PR) |
| 3 | Election-period 503 + retry | ✅ for S3 (this PR — same fallback as
Dynamo) |
| 4 | Leader demotes stale full role | ✅ for S3 (existing
`validatePrincipal` runs before bucket dispatch) |
| 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) |
| 6 | `forwarded_from` in audit log | ✅ for S3 (this PR — leader-side
audit log carries it) |
## Self-review (5 lenses)
1. **Data loss**: No FSM / Raft / Pebble path changes. The leader-side
dispatcher reuses the same `BucketsSource` interface that the
leader-direct HTTP path uses; the only new code path is the JSON
envelope + principal re-validation.
2. **Concurrency**: New methods on `gRPCForwardClient` reuse the
existing `forward()` helper, so connection-cache reuse and
`ErrLeaderUnavailable` signalling behave identically across resource
types. The `runtimeServerRunner.start()` reorder is safe — each adapter
listens on its own address and the raft TCP listeners are independent.
3. **Performance**: One additional gRPC round-trip on follower-side
admin writes; identical to Dynamo's path.
4. **Data consistency**: Leader-side dispatcher applies the same strict
body decode the leader-direct HTTP path does (NUL-byte rejection,
trailing-token rejection, slash-in-name rejection).
`forwardBucketsErrorResponse` produces byte-identical envelopes to
`writeBucketsError` so forwarded and leader-direct responses are
indistinguishable.
5. **Test coverage**: 19 new tests — 9 forward-server (happy paths, 501
no-source, bad-JSON, already-exists, not-empty, slash-in-name,
missing-acl, payload-too-large sweep), 4 forward-client (3 happy-path
payload shapes + ErrLeaderUnavailable for all three), 6
handler-integration (forwarded create/delete/put-acl happy +
leader-unavailable + transport-error + 3-axis non-NotLeader gate sweep).
Slice 2b of P2 (docs/design/2026_04_24_proposed_admin_dashboard.md
Section 3.3.2 + 4.1): a follower-side S3 admin write
(POST /buckets, PUT /buckets/{name}/acl, DELETE /buckets/{name})
now hands off to the leader transparently, completing the same
end-to-end forwarding contract Dynamo writes received in #644 +
Stacked on #669 (P2 slice 2a). Once #669 merges, this rebases
cleanly onto main.
Proto (proto/admin_forward.proto):
- Three new ADMIN_OP enum values (CREATE_BUCKET / DELETE_BUCKET /
PUT_BUCKET_ACL) appended after the Dynamo block so existing
wire-format integers stay stable. Regenerated with the pinned
protoc 29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
Leader-side ForwardServer (internal/admin/forward_server.go):
- WithBucketsSource lets deployments wire the S3 dispatcher
optionally — Dynamo-only builds keep the receiver nil and the
three new operations return 501 NotImplemented.
- Three new dispatch arms: handleCreateBucket / handleDeleteBucket /
handlePutBucketAcl. Each one mirrors the leader-direct HTTP
path's payload contract (NUL-byte rejection, 64 KiB limit,
DisallowUnknownFields, trailing-token rejection, slash-in-name
rejection) so a hostile follower cannot smuggle a payload past
validations the leader-direct path enforces.
- forwardBucketsErrorResponse mirrors forwardErrorResponse on the
Dynamo side: ErrBucketsForbidden / NotLeader / NotFound /
AlreadyExists / NotEmpty + ValidationError each map to the same
HTTP status the leader-direct writeBucketsError produces, so
forwarded and leader-direct responses are byte-for-byte
indistinguishable from the SPA's view.
- isStructuredSourceError extended to recognise the bucket
sentinels so they are NOT logged at LevelError on the leader.
- Forward's switch was extracted into dispatchForward to keep the
parent function under cyclop's 10-branch ceiling as the operation
enum grew.
Follower-side LeaderForwarder (internal/admin/forward_client.go):
- Interface gains ForwardCreateBucket / ForwardDeleteBucket /
ForwardPutBucketAcl. PutBucketAcl carries both the bucket name
(from the URL path) and the new ACL (from the request body) in
one JSON payload — same approach handleDeleteBucket takes for
the bucket name.
- gRPCForwardClient methods reuse the existing forward() helper
for transport, so connection-cache reuse and ErrLeaderUnavailable
signalling behave identically across resource types.
Handler integration (internal/admin/s3_handler.go):
- New `forwarder LeaderForwarder` field + WithLeaderForwarder method.
- handleCreate / handlePutACL / handleDelete now consult tryForward*
helpers when the source returned ErrBucketsNotLeader; the helpers
are gated on `errors.Is(err, ErrBucketsNotLeader) && forwarder != nil`
so a leader-direct rejection (already-exists, not-found, etc.) is
never re-applied at the leader.
- writeForwardResult / writeForwardFailure mirror the Dynamo handler's
pattern: nosniff + Cache-Control:no-store + Retry-After:1 on 503.
ErrLeaderUnavailable does NOT log at LevelError (elections are
routine); transport errors do log so operators can investigate.
Wiring (main.go + main_admin_forward.go):
- adminForwardServerDeps gains a `buckets` field; readyForRegistration
still requires only TablesSource + RoleStore (so cluster-only or
Dynamo-only builds keep registering Dynamo forwarding without S3).
- runtimeServerRunner.start() now creates *adapter.S3Server BEFORE
startRaftServers (in addition to dynamoServer) so the leader-side
ForwardServer registration sees both adapters. The reorder is safe:
each adapter listens on its own address and the raft TCP listeners
are independent.
- ServerDeps.Forwarder now plumbs through buildS3HandlerForDeps too,
so the follower's S3Handler picks up the same LeaderForwarder
instance the Dynamo handler does.
Tests:
- 9 forward-server tests covering the three new bucket operations:
happy path / no-BucketsSource→501 / bad-JSON 400 / already-exists
409 / not-empty 409 / slash-in-name 400 / missing-acl 400 /
payload-too-large 413 (sweep over all three ops).
- 4 forward-client tests covering ForwardCreateBucket /
ForwardDeleteBucket / ForwardPutBucketAcl happy-path payload
shapes + ErrLeaderUnavailable on no-leader.
- stubLeaderForwarder gains 3 bucket-side forward methods so
existing dynamo tests still satisfy the LeaderForwarder
interface, and the new stub fields let bucket-handler tests
verify the forward arguments.
- 6 handler integration tests on S3Handler.tryForward*: forwarded
create / delete / put-acl happy paths (replay leader's status
+ payload + content-type), forwarder ErrLeaderUnavailable → 503
+ Retry-After, transport-error → 503 + no leakage, and a 3-axis
gate sweep proving the forwarder is NOT invoked on
AlreadyExists / Forbidden / generic source errors.
Closes design 3.3.2 acceptance criteria 2 (transparent forwarding)
+ 6 (forwarded_from in audit log) for S3 admin writes; criterion 3
(election-period 503 + retry) is also live for S3 because the
existing tryForward helpers reuse the same fallback paths.
Per docs/design/README.md's lifecycle convention. The original P1–P4 plan has fully shipped: - P1 (admin skeleton + Dynamo + AdminForward) — #634/#635/#644/#648 - P2 (S3 endpoints incl. write paths and AdminForward integration) — #658 / #669 / #673 / #695 (TOCTOU safety net) - P3 (React SPA + embed) — #649 / #650 - P4 (TLS / role / CSRF / operator doc / deployment runbook / scripts/rolling-update.sh admin support) — #674 / #669 / #678 The AdminDeleteBucket TOCTOU caught during PR #669 review (the last "in-flight" item that kept the doc at _partial_) is fully resolved by the safety-net design landed in #695. What changed: - git mv 2026_04_24_partial_admin_dashboard.md → 2026_04_24_implemented_admin_dashboard.md (history follows the rename) - Header Status line: "Partial" → "Implemented", explanation updated to reflect the post-fix state and the rationale for promotion. - "Last updated" bumped to 2026-04-28 with the rename trigger. - Section heading "Outstanding open items" → "Out-of-scope follow-ups" — the remaining three entries (criterion 5, object browser, TLS hot-reload) are not in-flight work; they are deferred-at-design or Non-goal items. The TOCTOU bullet is removed (resolved) and replaced with a one-line cross-link to the safety-net design + admin_deployment.md §4.6 contract. - Removed the closing "rename trigger" sentence — we just did the rename. - Status table: P2 row now lists #695 alongside #658/#669/#673 so a future reader can find the TOCTOU fix from the index. - Cross-references updated everywhere the old filename appeared: docs/admin.md (header link + Cross-references) docs/admin_deployment.md (header link + final cross-ref) docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md (Background section pointer) internal/admin/config.go (Section 7.1 reference comment) No code changes other than the comment-only filename refresh in config.go.
## Summary Promote the admin dashboard design doc from `_partial_` → `_implemented_` per `docs/design/README.md`'s lifecycle convention. PR #695 landed the TOCTOU safety-net fix (the last in-flight item that kept the doc at `_partial_`), so the original P1–P4 plan is now fully shipped: | Phase | Landed via | |---|---| | P1 (admin skeleton + Dynamo + AdminForward) | #634 / #635 / #644 / #648 | | P2 (S3 endpoints + writes + AdminForward S3 + TOCTOU fix) | #658 / #669 / #673 / **#695** | | P3 (React SPA + embed) | #649 / #650 | | P4 (TLS / role / CSRF / operator doc / runbook / script wiring) | #674 / #669 / #678 | The remaining three items in the doc move from "Outstanding open items" (in-flight) to **"Out-of-scope follow-ups"** (deferred-at-design or Non-goal): - AdminForward criterion 5 — rolling-upgrade flag, deferred behind a cluster-version bump that doesn't exist yet - S3 object browser — Non-goal per §2.2 - TLS cert hot-reload — out of scope per `docs/admin.md` ## Changes - `git mv` partial → implemented (history follows the rename) - Status line / Last-updated / status-table / Out-of-scope section content reflects the promotion - Cross-references updated in all 4 referencing files: `docs/admin.md`, `docs/admin_deployment.md`, `docs/design/2026_04_28_proposed_admin_delete_bucket_safety_net.md`, `internal/admin/config.go` (comment-only) ## Test plan - [x] No code changes other than a comment-only filename refresh in `config.go` - [x] `go build ./...` passes - [x] `golangci-lint run ./internal/admin/...` — 0 issues - [x] `grep -rn "2026_04_24_partial\|2026_04_24_proposed_admin"` returns nothing — no stale references
Phase 1 + 2 of Task #26: the follower-side
LeaderForwarderclient and its integration into the dynamo HTTP handler. Builds on the AdminForward leader-side dispatcher landed via #635.Summary
LeaderForwarderinterface (internal/admin/forward_client.go) decouples the dynamo HTTP handler frompb.AdminForwardClient. The handler stays proto-free; the bridge inmain_admin.go(next phase) plugs in the gRPC-backed implementation.gRPCForwardClienttranslates aCreateTableRequest/ table-name into anAdminForwardRequest, dials via aGRPCConnFactory(production wrapskv.GRPCConnCache), and re-shapes the response intoForwardResult(status, payload, content-type).forwarded_from = nodeIDis populated so the leader's audit log carries the trace (criterion 6, leader-side already shipped in admin: AdminForward RPC + leader-side dispatcher (P1, partial) #635).status_code == 0upgrades to502 Bad Gateway; missingContentTypefills the JSON default. Both surface transport bugs rather than producing silently-malformed SPA responses.ErrLeaderUnavailablesentinel signals the "no leader known" case so the handler can map to 503 +Retry-After: 1(criterion 3).DynamoHandlergains aforwarderfield andWithLeaderForwarderoption. When set,handleCreate/handleDeletecatchErrTablesNotLeaderfrom the source and forward to the leader transparently — the SPA cannot tell forwarded from leader-direct.writeForwardResultre-emits the leader's structured response verbatim (status + payload + content-type), so a forwarded409 Conflictfrom the leader stays409on the wire — no re-classification.writeForwardFailuremapsErrLeaderUnavailable(election in flight) and gRPC transport errors to 503 +Retry-After: 1.ErrLeaderUnavailableis intentionally NOT logged at error level (elections are routine); transport errors are logged at LevelError so operators can investigate.What is NOT in this PR
main.go(production wiring of theForwardServerfrom admin: AdminForward RPC + leader-side dispatcher (P1, partial) #635) — comes in the next phase.kv.GRPCConnCacheand suppliesLeaderAddressResolver— same phase.Test plan
go build ./...go vet ./...golangci-lint run(admin package: 0 issues)go test ./internal/admin/ -count=1 -raceforwarded_fromround-trip,ErrLeaderUnavailable, dial/RPC errors propagated withcockroachdb/errorswrapping, zero status code upgrade, missing content type fallbackErrLeaderUnavailable→ 503 + Retry-After, transport error → 503 + log, leader 409 pass-through, leader 503 + Retry-After preserved, role check short-circuits before forward, body validation short-circuits before forwardpb.RegisterAdminForwardServerinmain.goand exercise an end-to-end follower → leader call against a real cluster (next PR).Acceptance criteria coverage
forwarded_fromin audit log