|
| 1 | +# elastickv admin dashboard — operator guide |
| 2 | + |
| 3 | +This document covers configuration and day-2 operation of the admin |
| 4 | +HTTP listener. Architecture and design rationale live in |
| 5 | +[docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md); |
| 6 | +read that first if you're touching the code. |
| 7 | + |
| 8 | +## What the admin dashboard is |
| 9 | + |
| 10 | +A separate HTTP listener (default `127.0.0.1:8080`) that exposes a |
| 11 | +React SPA + JSON API for inspecting the cluster and managing |
| 12 | +DynamoDB tables / S3 buckets without having to construct SigV4 |
| 13 | +requests. It is **disabled by default**: set `-adminEnabled` to turn |
| 14 | +it on. |
| 15 | + |
| 16 | +The listener is independent of the data-plane DynamoDB |
| 17 | +(`-dynamoAddress`) and S3 (`-s3Address`) endpoints — credentials, |
| 18 | +TLS, and auth are configured separately. |
| 19 | + |
| 20 | +## Quick start (loopback dev) |
| 21 | + |
| 22 | +The minimum invocation that produces a working dashboard: |
| 23 | + |
| 24 | +```sh |
| 25 | +./elastickv \ |
| 26 | + -raftId=n1 -raftBootstrap \ |
| 27 | + -dynamoAddress=127.0.0.1:8000 \ |
| 28 | + -s3Address=127.0.0.1:9000 \ |
| 29 | + -s3CredentialsFile=/path/to/creds.json \ |
| 30 | + -adminEnabled \ |
| 31 | + -adminSessionSigningKeyFile=/path/to/admin-hs256.b64 \ |
| 32 | + -adminFullAccessKeys=AKIA_ADMIN |
| 33 | +``` |
| 34 | + |
| 35 | +Then open `http://127.0.0.1:8080/admin/` in a browser and log in |
| 36 | +with the access key + secret pair from the credentials file. |
| 37 | + |
| 38 | +## Configuration reference |
| 39 | + |
| 40 | +### Required when `-adminEnabled=true` |
| 41 | + |
| 42 | +| Flag | Description | |
| 43 | +|---|---| |
| 44 | +| `-adminEnabled` | Master on/off switch. Default `false`. | |
| 45 | +| `-adminSessionSigningKey` *or* `-adminSessionSigningKeyFile` *or* `ELASTICKV_ADMIN_SESSION_SIGNING_KEY` | Cluster-shared base64-encoded HS256 key — **exactly 64 raw bytes** (88 base64 chars with standard padding, or 86 with `RawURLEncoding`). The validator rejects any other length at startup with a precise error message. **Must be the same on every node** — JWTs minted by node A are verified by node B during follower→leader forwarding, so a mismatch breaks the dashboard's read paths on follower nodes. The `*File` / env-var forms keep the secret out of `/proc/<pid>/cmdline`. | |
| 46 | +| `-s3CredentialsFile` | JSON file with at least one access key + secret key pair. Same file the S3 adapter uses for SigV4; the admin dashboard reuses it for login authentication. | |
| 47 | +| `-adminFullAccessKeys` *and/or* `-adminReadOnlyAccessKeys` | Comma-separated allow-lists. Only access keys listed here may log into the dashboard, even if their SigV4 secret validates against the credentials file. Keys must not appear in both lists. | |
| 48 | + |
| 49 | +### Optional |
| 50 | + |
| 51 | +| Flag | Description | |
| 52 | +|---|---| |
| 53 | +| `-adminListen` | host:port for the admin listener. Defaults to `127.0.0.1:8080`. | |
| 54 | +| `-adminTLSCertFile` / `-adminTLSKeyFile` | PEM cert + key. Both must be set together; a partial config fails validation at startup. | |
| 55 | +| `-adminAllowPlaintextNonLoopback` | Explicit opt-out for the non-loopback-without-TLS startup hard-error. **Strongly discouraged** — lets the listener accept plaintext on a non-loopback bind. **Does not** affect the cookie `Secure` attribute (that is `-adminAllowInsecureDevCookie` below); a deployment that sets only this flag will mint `Secure` cookies that the browser refuses to send over the plaintext channel, breaking session lifetime end-to-end. Pair it with `-adminAllowInsecureDevCookie` if the goal is a working plaintext rig. | |
| 56 | +| `-adminSessionSigningKeyPrevious` *or* `-adminSessionSigningKeyPreviousFile` *or* `ELASTICKV_ADMIN_SESSION_SIGNING_KEY_PREVIOUS` | Previous HS256 key used only for verification during a rotation window. New tokens always use the primary key; existing tokens minted under the previous key continue to verify until they expire. | |
| 57 | +| `-adminAllowInsecureDevCookie` | Mints session cookies without `Secure` for local plaintext development. Do not set on any deployment that touches a network. | |
| 58 | + |
| 59 | +### Hard-error startup conditions |
| 60 | + |
| 61 | +The process fails to start (non-zero exit) when: |
| 62 | + |
| 63 | +- `-adminEnabled=true` but `-s3CredentialsFile` is empty or missing, or its parsed map has zero entries — without credentials every login is rejected, and "locked-down admin" is `-adminEnabled=false`. |
| 64 | +- `-adminEnabled=true` but `-adminSessionSigningKey` (and the `*File` / env var) all decode to empty. |
| 65 | +- `-adminEnabled=true` but `-adminListen` is empty or not a valid host:port. |
| 66 | +- `-adminTLSCertFile` xor `-adminTLSKeyFile` is set (partial TLS config). |
| 67 | +- `-adminListen` is bound to a non-loopback address, TLS is not configured, **and** `-adminAllowPlaintextNonLoopback` is not set. The error message names the flag combinations that resolve it. |
| 68 | +- `-adminFullAccessKeys` and `-adminReadOnlyAccessKeys` overlap (the same access key listed in both). |
| 69 | + |
| 70 | +These are deliberate — silent fallbacks to "auth disabled" or "TLS |
| 71 | +off" would downgrade security guarantees the operator is unaware of. |
| 72 | + |
| 73 | +## TLS setup |
| 74 | + |
| 75 | +Two supported topologies: |
| 76 | + |
| 77 | +### A. Loopback only (`127.0.0.1` / `::1`) |
| 78 | + |
| 79 | +No TLS required. By default the dashboard mints cookies with |
| 80 | +`Secure=true`, which most modern browsers accept on the loopback |
| 81 | +origin even without TLS (the loopback-is-trusted policy). If a |
| 82 | +specific browser refuses the cookie in this configuration, set |
| 83 | +`-adminAllowInsecureDevCookie` to mint without `Secure` — the flag |
| 84 | +is intentionally distinct from `-adminAllowPlaintextNonLoopback` |
| 85 | +because the listener can be plaintext for entirely separate |
| 86 | +reasons (loopback) than the cookie needing to drop `Secure`. |
| 87 | + |
| 88 | +### B. Reachable address with TLS |
| 89 | + |
| 90 | +Set `-adminListen` to the public bind, plus `-adminTLSCertFile` and |
| 91 | +`-adminTLSKeyFile`. TLS 1.2+ is enforced. Cookies are issued with |
| 92 | +`Secure; SameSite=Strict; HttpOnly`. |
| 93 | + |
| 94 | +Cert renewal: the listener picks up the cert files at startup only; |
| 95 | +restart the process after rotating certs. Hot-reload is not |
| 96 | +implemented (out of scope for the dashboard's maintenance model). |
| 97 | + |
| 98 | +### Discouraged: plaintext non-loopback |
| 99 | + |
| 100 | +`-adminAllowPlaintextNonLoopback` exists as an escape hatch for |
| 101 | +short-lived test deployments. The session JWT and its bearer cookie |
| 102 | +travel in clear text in this mode; anyone on the path can replay |
| 103 | +the token until it expires. Do not enable on a long-running |
| 104 | +deployment. |
| 105 | + |
| 106 | +A working plaintext rig also needs `-adminAllowInsecureDevCookie` — |
| 107 | +otherwise the dashboard mints cookies with `Secure=true` and the |
| 108 | +browser refuses to send them back over plaintext, so login appears |
| 109 | +to succeed but every subsequent request 401s. The two flags are |
| 110 | +deliberately separate so a misconfigured deployment fails closed |
| 111 | +on either axis (TLS guard or cookie attribute) rather than |
| 112 | +silently downgrading both at once. |
| 113 | + |
| 114 | +## Roles |
| 115 | + |
| 116 | +Two roles, both checked against the live `-adminFullAccessKeys` / |
| 117 | +`-adminReadOnlyAccessKeys` lists on **every** state-changing |
| 118 | +request (not just at login): |
| 119 | + |
| 120 | +- **read-only** — may list / describe Dynamo tables and S3 buckets, view cluster status. Cannot create, mutate ACL, or delete. |
| 121 | +- **full** — adds POST / PUT / DELETE on `/dynamo/tables` and `/s3/buckets`. |
| 122 | + |
| 123 | +A key revoked from `-adminFullAccessKeys` immediately loses |
| 124 | +write access on the next request — the dashboard does not wait for |
| 125 | +the token to expire. The token's role claim is treated as a hint; |
| 126 | +the live role index is authoritative. |
| 127 | + |
| 128 | +## API surface |
| 129 | + |
| 130 | +All endpoints are under `/admin/api/v1/`. Authentication: cookie |
| 131 | +session minted by `POST /auth/login`; CSRF: double-submit token in |
| 132 | +`admin_csrf` cookie + `X-Admin-CSRF` header on every state-changing |
| 133 | +method. |
| 134 | + |
| 135 | +| Method | Path | Role | Notes | |
| 136 | +|---|---|---|---| |
| 137 | +| `POST` | `/auth/login` | none | Body `{access_key, secret_key}`. Sets `admin_session` and `admin_csrf` cookies. | |
| 138 | +| `POST` | `/auth/logout` | any | Invalidates the session cookie. | |
| 139 | +| `GET` | `/cluster` | any | Node ID, Raft leader, version. | |
| 140 | +| `GET` | `/dynamo/tables` | any | Paginated list. `?limit=` (default 100, max 1000). | |
| 141 | +| `POST` | `/dynamo/tables` | full | Body schema in design 4.2. | |
| 142 | +| `GET` | `/dynamo/tables/{name}` | any | Schema + GSI summary. | |
| 143 | +| `DELETE` | `/dynamo/tables/{name}` | full | 204 on success. | |
| 144 | +| `GET` | `/s3/buckets` | any | Paginated list with the same `?limit=` semantics. | |
| 145 | +| `POST` | `/s3/buckets` | full | Body `{bucket_name, acl?}`. ACL omitted defaults to `private`. | |
| 146 | +| `GET` | `/s3/buckets/{name}` | any | Bucket meta + ACL. | |
| 147 | +| `PUT` | `/s3/buckets/{name}/acl` | full | Body `{acl}`. Only `private` and `public-read` are accepted. | |
| 148 | +| `DELETE` | `/s3/buckets/{name}` | full | 204 on success. The bucket must be empty (no objects); a non-empty bucket returns 409 `bucket_not_empty`. | |
| 149 | + |
| 150 | +## Follower → leader forwarding |
| 151 | + |
| 152 | +Writes (`POST` / `PUT` / `DELETE`) require the local node to be the |
| 153 | +Raft leader. When the SPA's request hits a follower, the dashboard |
| 154 | +transparently forwards the call to the leader over an internal |
| 155 | +gRPC service (`AdminForward`). The leader re-validates the |
| 156 | +principal against its own `adminFullAccessKeys` list before |
| 157 | +acting — a follower cannot smuggle a downgraded key past the |
| 158 | +leader's view. |
| 159 | + |
| 160 | +This means there is **no need to point the SPA at a specific |
| 161 | +node**: any node with `-adminEnabled` can serve the dashboard. |
| 162 | +Operators that fan out behind a load balancer get the same |
| 163 | +behaviour as a single-node cluster, with one caveat below. |
| 164 | + |
| 165 | +### Follower forwarding caveat: rolling configuration changes |
| 166 | + |
| 167 | +A configuration change (e.g. adding `AKIA_NEW` to |
| 168 | +`-adminFullAccessKeys`) must propagate to **every node** before |
| 169 | +the new key works against any follower's dashboard. During the |
| 170 | +rollout window: |
| 171 | + |
| 172 | +- A login against a node that has not yet been restarted with the new flags fails with 403. |
| 173 | +- A token minted by an updated node, replayed against a not-yet-updated node, will be re-validated against that node's stale role list. If the key is missing on the older node, the request fails with 403 even though the token is structurally valid. |
| 174 | + |
| 175 | +The dashboard does not have an automatic role-refresh path — restart |
| 176 | +each node after editing the access-key flags. |
| 177 | + |
| 178 | +### Election-period 503 |
| 179 | + |
| 180 | +When the leader steps down mid-write (or has not yet been elected |
| 181 | +after a fresh start), the forwarder cannot reach a leader and the |
| 182 | +SPA receives `503 Service Unavailable` with a `Retry-After: 1` |
| 183 | +header. The current SPA client (`web/admin/src/api/client.ts`) |
| 184 | +makes a single `fetch` call with no automatic retry, so the user |
| 185 | +sees the 503 surfaced directly and must re-issue the action. The |
| 186 | +`Retry-After: 1` header is still emitted so a future client (or an |
| 187 | +external operator script driving the JSON API) can implement the |
| 188 | +one-second back-off the server is asking for. Operators |
| 189 | +investigating "intermittent 503s" should look at Raft leader-churn |
| 190 | +logs first. |
| 191 | + |
| 192 | +## Audit log |
| 193 | + |
| 194 | +Every state-changing admin request emits structured slog lines at |
| 195 | +`INFO` level under the `admin_audit` key on the leader's stdout (or |
| 196 | +wherever the process slog handler is wired). A protected-chain |
| 197 | +mutation (Dynamo / S3 / cluster / keyviz writes) typically produces |
| 198 | +**two** audit lines: one operation-specific line from the source |
| 199 | +that performed the mutation, plus one generic HTTP-shaped line from |
| 200 | +the `Audit` middleware. Auth endpoints (`/auth/login`, `/auth/logout`) |
| 201 | +produce **one** line — the action-specific one from `AuthService` — |
| 202 | +because the generic middleware is intentionally not wrapped around |
| 203 | +them (see the per-shape section below for why). The shapes differ |
| 204 | +by source — log parsers should treat the `admin_audit` key as a |
| 205 | +union and dispatch on the fields present. |
| 206 | + |
| 207 | +**`Audit` middleware** — emitted for non-GET/HEAD/OPTIONS requests |
| 208 | +on the **protected mux chain** (Dynamo, S3, cluster, keyviz) after |
| 209 | +`SessionAuth` accepts the session, but **before** `CSRFDoubleSubmit` |
| 210 | +runs. That ordering is deliberate: a CSRF-rejected protected |
| 211 | +request still produces an audit line because the actor is already |
| 212 | +known, but an unauthenticated request (no / invalid session) is |
| 213 | +rejected at `SessionAuth` and never reaches the middleware. The |
| 214 | +following endpoints are **not** wrapped by this middleware and rely |
| 215 | +on their own `admin_audit` emission instead: |
| 216 | + |
| 217 | +- `/auth/login` — runs without a pre-existing session, so the |
| 218 | + generic middleware cannot identify the actor; `AuthService` |
| 219 | + emits `admin_audit action=login` (success and failure) directly. |
| 220 | +- `/auth/logout` — runs through `protectNoAudit` so logout produces |
| 221 | + exactly one `admin_audit action=logout` line from `AuthService` |
| 222 | + rather than two (a generic line plus the action-specific one). |
| 223 | + |
| 224 | +For requests that *do* reach the middleware, the line is always |
| 225 | +present on the node that received the HTTP request — which may be |
| 226 | +a follower if the request was then forwarded: |
| 227 | + |
| 228 | +``` |
| 229 | +admin_audit actor=AKIA_ADMIN role=full method=POST path=/admin/api/v1/buckets status=201 remote=10.0.0.7:51234 duration=8.2ms |
| 230 | +``` |
| 231 | + |
| 232 | +**`S3Handler` operation line** — emitted on the leader after a |
| 233 | +successful bucket mutation. Only the S3 admin path emits these; the |
| 234 | +DynamoDB admin path relies on the middleware line plus the forwarded |
| 235 | +line below for its audit trail: |
| 236 | + |
| 237 | +``` |
| 238 | +admin_audit actor=AKIA_ADMIN role=full operation=create_bucket bucket=my-bucket |
| 239 | +admin_audit actor=AKIA_ADMIN role=full operation=put_bucket_acl bucket=my-bucket acl=public-read |
| 240 | +admin_audit actor=AKIA_ADMIN role=full operation=delete_bucket bucket=my-bucket |
| 241 | +``` |
| 242 | + |
| 243 | +**`ForwardServer` operation line** — emitted on the leader when a |
| 244 | +follower forwarded the request via `AdminForward`. Carries the |
| 245 | +originating follower's node ID in `forwarded_from`. Covers both |
| 246 | +DynamoDB and S3 admin operations: |
| 247 | + |
| 248 | +``` |
| 249 | +admin_audit actor=AKIA_ADMIN role=full forwarded_from=n2 operation=create_table table=orders |
| 250 | +admin_audit actor=AKIA_ADMIN role=full forwarded_from=n2 operation=delete_table table=orders |
| 251 | +admin_audit actor=AKIA_ADMIN role=full forwarded_from=n2 operation=put_bucket_acl bucket=my-bucket acl=public-read |
| 252 | +``` |
| 253 | + |
| 254 | +CR and LF in `forwarded_from` are stripped at the entry point — a |
| 255 | +hostile follower cannot split a single audit line into two by |
| 256 | +smuggling control characters into its node ID. |
| 257 | + |
| 258 | +Login and logout emit their own `admin_audit` lines so the JWT's |
| 259 | +lifetime can be correlated with the mutations it authorised. The |
| 260 | +two shapes differ on a single field — login carries `claimed_actor` |
| 261 | +because the access key the operator typed is distinct from the |
| 262 | +authenticated `actor` (a successful login proves they match; a |
| 263 | +failed login records what was claimed), while logout has no claim |
| 264 | +to verify and omits the field: |
| 265 | + |
| 266 | +``` |
| 267 | +admin_audit action=login actor=AKIA_ADMIN claimed_actor=AKIA_ADMIN remote=10.0.0.7:51234 status=200 |
| 268 | +admin_audit action=logout actor=AKIA_ADMIN remote=10.0.0.7:51234 status=200 |
| 269 | +``` |
| 270 | + |
| 271 | +Log parsers consuming this shape should treat `claimed_actor` as |
| 272 | +present-only-on-login. |
| 273 | + |
| 274 | +## Troubleshooting |
| 275 | + |
| 276 | +### "admin listener is enabled but no static credentials are configured" |
| 277 | + |
| 278 | +Either `-s3CredentialsFile` is unset or the file parses to an empty |
| 279 | +map. Check the file exists and contains at least one entry: |
| 280 | +```json |
| 281 | +{"credentials":[{"access_key_id":"AKIA_ADMIN","secret_access_key":"..."}]} |
| 282 | +``` |
| 283 | + |
| 284 | +### "is not loopback but TLS is not configured" |
| 285 | + |
| 286 | +Default-deny safety net. Either set `-adminTLSCertFile` + |
| 287 | +`-adminTLSKeyFile`, or pass `-adminAllowPlaintextNonLoopback` (and |
| 288 | +read the TLS section above before doing so). |
| 289 | + |
| 290 | +### Login returns 401 invalid_credentials |
| 291 | + |
| 292 | +The access key + secret pair did not match an entry in |
| 293 | +`-s3CredentialsFile`. Either the access key is unknown or the secret |
| 294 | +is wrong. Verify the credentials file is the one the running process |
| 295 | +loaded (it is read once at startup) and that the secret matches |
| 296 | +exactly — secrets are compared with `subtle.ConstantTimeCompare`, so |
| 297 | +trailing whitespace counts. |
| 298 | + |
| 299 | +### Login returns 403 forbidden |
| 300 | + |
| 301 | +The credentials matched, but the access key is not listed in either |
| 302 | +`-adminFullAccessKeys` or `-adminReadOnlyAccessKeys`. This is a |
| 303 | +distinct case from the 401 above: the operator has valid SigV4 |
| 304 | +credentials for the data plane but no admin role assignment. Add the |
| 305 | +key to one of the role flags and **restart every node** so each |
| 306 | +node's live role index picks up the change. |
| 307 | + |
| 308 | +### Write returns 403 forbidden |
| 309 | + |
| 310 | +The principal's role is read-only. Move the access key into |
| 311 | +`-adminFullAccessKeys` (and remove it from |
| 312 | +`-adminReadOnlyAccessKeys`), then **restart every node** so each |
| 313 | +node's live role index picks up the change. |
| 314 | + |
| 315 | +### Write returns 503 leader_unavailable |
| 316 | + |
| 317 | +The Raft cluster is mid-election. Re-issue the request after the |
| 318 | +`Retry-After: 1` header tells you to. If it persists past one or |
| 319 | +two seconds, check Raft leader status via the admin |
| 320 | +`/admin/api/v1/cluster` endpoint or `cmd/elastickv-admin`. |
| 321 | + |
| 322 | +### `bucket_not_empty` on DELETE |
| 323 | + |
| 324 | +The dashboard cannot force a recursive delete by design — the |
| 325 | +SPA's job is to surface the error and guide the operator to clean |
| 326 | +up first. Use the SigV4 S3 path (`aws s3 rm s3://<bucket> --recursive`) |
| 327 | +to drain the bucket, then retry the DELETE on the dashboard. |
| 328 | + |
| 329 | +### Stuck SPA / blank screen |
| 330 | + |
| 331 | +The dashboard ships a placeholder `internal/admin/dist/index.html` |
| 332 | +that renders a "bundle missing" page when `make` was run without |
| 333 | +the SPA build step. Run `cd web/admin && npm install && npm run build` |
| 334 | +to populate the embedded `dist` directory, then rebuild the binary. |
| 335 | + |
| 336 | +## Cross-references |
| 337 | + |
| 338 | +- Design rationale: [docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md) (renamed to `_partial_` in PR #675; this link will follow once that lands) |
| 339 | +- Architecture overview: [docs/architecture_overview.md](architecture_overview.md) |
| 340 | +- AdminForward RPC contract: `proto/admin_forward.proto` |
0 commit comments