|
| 1 | +# elastickv admin dashboard — operator guide |
| 2 | + |
| 3 | +This document covers configuration and day-2 operation of the admin |
| 4 | +HTTP listener. Architecture and design rationale live in |
| 5 | +[docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md); |
| 6 | +read that first if you're touching the code. |
| 7 | + |
| 8 | +## What the admin dashboard is |
| 9 | + |
| 10 | +A separate HTTP listener (default `127.0.0.1:8080`) that exposes a |
| 11 | +React SPA + JSON API for inspecting the cluster and managing |
| 12 | +DynamoDB tables / S3 buckets without having to construct SigV4 |
| 13 | +requests. It is **disabled by default**: set `-adminEnabled` to turn |
| 14 | +it on. |
| 15 | + |
| 16 | +The listener is independent of the data-plane DynamoDB |
| 17 | +(`-dynamoAddress`) and S3 (`-s3Address`) endpoints — credentials, |
| 18 | +TLS, and auth are configured separately. |
| 19 | + |
| 20 | +## Quick start (loopback dev) |
| 21 | + |
| 22 | +The minimum invocation that produces a working dashboard: |
| 23 | + |
| 24 | +```sh |
| 25 | +./elastickv \ |
| 26 | + -raftId=n1 -raftBootstrap \ |
| 27 | + -dynamoAddress=127.0.0.1:8000 \ |
| 28 | + -s3Address=127.0.0.1:9000 \ |
| 29 | + -s3CredentialsFile=/path/to/creds.json \ |
| 30 | + -adminEnabled \ |
| 31 | + -adminSessionSigningKeyFile=/path/to/admin-hs256.b64 \ |
| 32 | + -adminFullAccessKeys=AKIA_ADMIN |
| 33 | +``` |
| 34 | + |
| 35 | +Then open `http://127.0.0.1:8080/admin/` in a browser and log in |
| 36 | +with the access key + secret pair from the credentials file. |
| 37 | + |
| 38 | +## Configuration reference |
| 39 | + |
| 40 | +### Required when `-adminEnabled=true` |
| 41 | + |
| 42 | +| Flag | Description | |
| 43 | +|---|---| |
| 44 | +| `-adminEnabled` | Master on/off switch. Default `false`. | |
| 45 | +| `-adminSessionSigningKey` *or* `-adminSessionSigningKeyFile` *or* `ELASTICKV_ADMIN_SESSION_SIGNING_KEY` | Cluster-shared base64-encoded HS256 key (≥ 32 raw bytes / 44 base64 chars). **Must be the same on every node** — JWTs minted by node A are verified by node B during follower→leader forwarding, so a mismatch breaks the dashboard's read paths on follower nodes. The `*File` / env-var forms keep the secret out of `/proc/<pid>/cmdline`. | |
| 46 | +| `-s3CredentialsFile` | JSON file with at least one access key + secret key pair. Same file the S3 adapter uses for SigV4; the admin dashboard reuses it for login authentication. | |
| 47 | +| `-adminFullAccessKeys` *and/or* `-adminReadOnlyAccessKeys` | Comma-separated allow-lists. Only access keys listed here may log into the dashboard, even if their SigV4 secret validates against the credentials file. Keys must not appear in both lists. | |
| 48 | + |
| 49 | +### Optional |
| 50 | + |
| 51 | +| Flag | Description | |
| 52 | +|---|---| |
| 53 | +| `-adminListen` | host:port for the admin listener. Defaults to `127.0.0.1:8080`. | |
| 54 | +| `-adminTLSCertFile` / `-adminTLSKeyFile` | PEM cert + key. Both must be set together; a partial config fails validation at startup. | |
| 55 | +| `-adminAllowPlaintextNonLoopback` | Explicit opt-out for the non-loopback-without-TLS startup hard-error. **Strongly discouraged** — enables the dashboard to mint cookies without the `Secure` attribute and ship session JWTs over plaintext. Use only for short-lived test rigs you control. | |
| 56 | +| `-adminSessionSigningKeyPrevious` *or* `-adminSessionSigningKeyPreviousFile` *or* `ELASTICKV_ADMIN_SESSION_SIGNING_KEY_PREVIOUS` | Previous HS256 key used only for verification during a rotation window. New tokens always use the primary key; existing tokens minted under the previous key continue to verify until they expire. | |
| 57 | +| `-adminAllowInsecureDevCookie` | Mints session cookies without `Secure` for local plaintext development. Do not set on any deployment that touches a network. | |
| 58 | + |
| 59 | +### Hard-error startup conditions |
| 60 | + |
| 61 | +The process fails to start (non-zero exit) when: |
| 62 | + |
| 63 | +- `-adminEnabled=true` but `-s3CredentialsFile` is empty or missing, or its parsed map has zero entries — without credentials every login is rejected, and "locked-down admin" is `-adminEnabled=false`. |
| 64 | +- `-adminEnabled=true` but `-adminSessionSigningKey` (and the `*File` / env var) all decode to empty. |
| 65 | +- `-adminEnabled=true` but `-adminListen` is empty or not a valid host:port. |
| 66 | +- `-adminTLSCertFile` xor `-adminTLSKeyFile` is set (partial TLS config). |
| 67 | +- `-adminListen` is bound to a non-loopback address, TLS is not configured, **and** `-adminAllowPlaintextNonLoopback` is not set. The error message names the flag combinations that resolve it. |
| 68 | +- `-adminFullAccessKeys` and `-adminReadOnlyAccessKeys` overlap (the same access key listed in both). |
| 69 | + |
| 70 | +These are deliberate — silent fallbacks to "auth disabled" or "TLS |
| 71 | +off" would downgrade security guarantees the operator is unaware of. |
| 72 | + |
| 73 | +## TLS setup |
| 74 | + |
| 75 | +Two supported topologies: |
| 76 | + |
| 77 | +### A. Loopback only (`127.0.0.1` / `::1`) |
| 78 | + |
| 79 | +No TLS required. The dashboard cookies still carry `Secure=false` |
| 80 | +when `-adminAllowInsecureDevCookie` is set; in normal loopback |
| 81 | +operation cookies are minted with `Secure` regardless and rely on |
| 82 | +the browser's loopback-is-trusted policy. |
| 83 | + |
| 84 | +### B. Reachable address with TLS |
| 85 | + |
| 86 | +Set `-adminListen` to the public bind, plus `-adminTLSCertFile` and |
| 87 | +`-adminTLSKeyFile`. TLS 1.2+ is enforced. Cookies are issued with |
| 88 | +`Secure; SameSite=Strict; HttpOnly`. |
| 89 | + |
| 90 | +Cert renewal: the listener picks up the cert files at startup only; |
| 91 | +restart the process after rotating certs. Hot-reload is not |
| 92 | +implemented (out of scope for the dashboard's maintenance model). |
| 93 | + |
| 94 | +### Discouraged: plaintext non-loopback |
| 95 | + |
| 96 | +`-adminAllowPlaintextNonLoopback` exists as an escape hatch for |
| 97 | +short-lived test deployments. The session JWT and its bearer cookie |
| 98 | +travel in clear text in this mode; anyone on the path can replay |
| 99 | +the token until it expires. Do not enable on a long-running |
| 100 | +deployment. |
| 101 | + |
| 102 | +## Roles |
| 103 | + |
| 104 | +Two roles, both checked against the live `-adminFullAccessKeys` / |
| 105 | +`-adminReadOnlyAccessKeys` lists on **every** state-changing |
| 106 | +request (not just at login): |
| 107 | + |
| 108 | +- **read-only** — may list / describe Dynamo tables and S3 buckets, view cluster status. Cannot create, mutate ACL, or delete. |
| 109 | +- **full** — adds POST / PUT / DELETE on `/dynamo/tables` and `/s3/buckets`. |
| 110 | + |
| 111 | +A key revoked from `-adminFullAccessKeys` immediately loses |
| 112 | +write access on the next request — the dashboard does not wait for |
| 113 | +the token to expire. The token's role claim is treated as a hint; |
| 114 | +the live role index is authoritative. |
| 115 | + |
| 116 | +## API surface |
| 117 | + |
| 118 | +All endpoints are under `/admin/api/v1/`. Authentication: cookie |
| 119 | +session minted by `POST /auth/login`; CSRF: double-submit token in |
| 120 | +`admin_csrf` cookie + `X-Admin-CSRF` header on every state-changing |
| 121 | +method. |
| 122 | + |
| 123 | +| Method | Path | Role | Notes | |
| 124 | +|---|---|---|---| |
| 125 | +| `POST` | `/auth/login` | none | Body `{access_key, secret_key}`. Sets `admin_session` and `admin_csrf` cookies. | |
| 126 | +| `POST` | `/auth/logout` | any | Invalidates the session cookie. | |
| 127 | +| `GET` | `/cluster` | any | Node ID, Raft leader, version. | |
| 128 | +| `GET` | `/dynamo/tables` | any | Paginated list. `?limit=` (default 100, max 1000). | |
| 129 | +| `POST` | `/dynamo/tables` | full | Body schema in design 4.2. | |
| 130 | +| `GET` | `/dynamo/tables/{name}` | any | Schema + GSI summary. | |
| 131 | +| `DELETE` | `/dynamo/tables/{name}` | full | 204 on success. | |
| 132 | +| `GET` | `/s3/buckets` | any | Paginated list with the same `?limit=` semantics. | |
| 133 | +| `POST` | `/s3/buckets` | full | Body `{bucket_name, acl?}`. ACL omitted defaults to `private`. | |
| 134 | +| `GET` | `/s3/buckets/{name}` | any | Bucket meta + ACL. | |
| 135 | +| `PUT` | `/s3/buckets/{name}/acl` | full | Body `{acl}`. Only `private` and `public-read` are accepted. | |
| 136 | +| `DELETE` | `/s3/buckets/{name}` | full | 204 on success. The bucket must be empty (no objects); a non-empty bucket returns 409 `bucket_not_empty`. | |
| 137 | + |
| 138 | +## Follower → leader forwarding |
| 139 | + |
| 140 | +Writes (`POST` / `PUT` / `DELETE`) require the local node to be the |
| 141 | +Raft leader. When the SPA's request hits a follower, the dashboard |
| 142 | +transparently forwards the call to the leader over an internal |
| 143 | +gRPC service (`AdminForward`). The leader re-validates the |
| 144 | +principal against its own `adminFullAccessKeys` list before |
| 145 | +acting — a follower cannot smuggle a downgraded key past the |
| 146 | +leader's view. |
| 147 | + |
| 148 | +This means there is **no need to point the SPA at a specific |
| 149 | +node**: any node with `-adminEnabled` can serve the dashboard. |
| 150 | +Operators that fan out behind a load balancer get the same |
| 151 | +behaviour as a single-node cluster, with one caveat below. |
| 152 | + |
| 153 | +### Follower forwarding caveat: rolling configuration changes |
| 154 | + |
| 155 | +A configuration change (e.g. adding `AKIA_NEW` to |
| 156 | +`-adminFullAccessKeys`) must propagate to **every node** before |
| 157 | +the new key works against any follower's dashboard. During the |
| 158 | +rollout window: |
| 159 | + |
| 160 | +- A login against a node that has not yet been restarted with the new flags fails with 403. |
| 161 | +- A token minted by an updated node, replayed against a not-yet-updated node, will be re-validated against that node's stale role list. If the key is missing on the older node, the request fails with 403 even though the token is structurally valid. |
| 162 | + |
| 163 | +The dashboard does not have an automatic role-refresh path — restart |
| 164 | +each node after editing the access-key flags. |
| 165 | + |
| 166 | +### Election-period 503 |
| 167 | + |
| 168 | +When the leader steps down mid-write (or has not yet been elected |
| 169 | +after a fresh start), the forwarder cannot reach a leader and the |
| 170 | +SPA receives `503 Service Unavailable` with a `Retry-After: 1` |
| 171 | +header. The SPA's API client honours `Retry-After` and re-issues |
| 172 | +the request once. Operators investigating "intermittent 503s" |
| 173 | +should look at Raft leader-churn logs first. |
| 174 | + |
| 175 | +## Audit log |
| 176 | + |
| 177 | +Every state-changing admin request emits a structured slog line at |
| 178 | +`INFO` level on the leader's stdout (or wherever the process slog |
| 179 | +handler is wired): |
| 180 | + |
| 181 | +``` |
| 182 | +admin_audit actor=AKIA_ADMIN role=full method=POST path=/admin/api/v1/dynamo/tables status=201 duration=8.2ms |
| 183 | +``` |
| 184 | + |
| 185 | +For forwarded requests, an extra `forwarded_from=<node-id>` field |
| 186 | +identifies the follower that received the original HTTP call. CR |
| 187 | +and LF in the field are stripped at the entry point — a hostile |
| 188 | +follower cannot split a single audit line into two by smuggling |
| 189 | +control characters into its node ID. |
| 190 | + |
| 191 | +Login and logout emit their own audit lines (`action=login` / |
| 192 | +`action=logout`) so the JWT's lifetime can be correlated with the |
| 193 | +mutations it authorised. |
| 194 | + |
| 195 | +## Troubleshooting |
| 196 | + |
| 197 | +### "admin listener is enabled but no static credentials are configured" |
| 198 | + |
| 199 | +Either `-s3CredentialsFile` is unset or the file parses to an empty |
| 200 | +map. Check the file exists and contains at least one entry: |
| 201 | +```json |
| 202 | +{"credentials":[{"access_key_id":"AKIA_ADMIN","secret_access_key":"..."}]} |
| 203 | +``` |
| 204 | + |
| 205 | +### "is not loopback but TLS is not configured" |
| 206 | + |
| 207 | +Default-deny safety net. Either set `-adminTLSCertFile` + |
| 208 | +`-adminTLSKeyFile`, or pass `-adminAllowPlaintextNonLoopback` (and |
| 209 | +read the TLS section above before doing so). |
| 210 | + |
| 211 | +### Login returns 401 invalid_credentials |
| 212 | + |
| 213 | +The access key + secret pair did not match the credentials file, or |
| 214 | +the key is not listed in `-adminFullAccessKeys` / |
| 215 | +`-adminReadOnlyAccessKeys`. The dashboard does not distinguish the |
| 216 | +two cases on the wire — both produce 401 — but the leader's audit |
| 217 | +log shows the precise reason. |
| 218 | + |
| 219 | +### Write returns 403 forbidden |
| 220 | + |
| 221 | +The principal's role is read-only. Move the access key into |
| 222 | +`-adminFullAccessKeys` (and remove it from |
| 223 | +`-adminReadOnlyAccessKeys`), then **restart every node** so each |
| 224 | +node's live role index picks up the change. |
| 225 | + |
| 226 | +### Write returns 503 leader_unavailable |
| 227 | + |
| 228 | +The Raft cluster is mid-election. Re-issue the request after the |
| 229 | +`Retry-After: 1` header tells you to. If it persists past one or |
| 230 | +two seconds, check Raft leader status via the data-plane |
| 231 | +`/admin/api/v1/cluster` endpoint or `cmd/elastickv-admin`. |
| 232 | + |
| 233 | +### `bucket_not_empty` on DELETE |
| 234 | + |
| 235 | +The dashboard cannot force a recursive delete by design — the |
| 236 | +SPA's job is to surface the error and guide the operator to clean |
| 237 | +up first. Use the SigV4 S3 path (`aws s3 rm s3://<bucket> --recursive`) |
| 238 | +to drain the bucket, then retry the DELETE on the dashboard. |
| 239 | + |
| 240 | +### Stuck SPA / blank screen |
| 241 | + |
| 242 | +The dashboard ships a placeholder `internal/admin/dist/index.html` |
| 243 | +that renders a "bundle missing" page when `make` was run without |
| 244 | +the SPA build step. Run `cd web/admin && npm install && npm run build` |
| 245 | +to populate the embedded `dist` directory, then rebuild the binary. |
| 246 | + |
| 247 | +## Cross-references |
| 248 | + |
| 249 | +- Design rationale: [docs/design/2026_04_24_partial_admin_dashboard.md](design/2026_04_24_partial_admin_dashboard.md) |
| 250 | +- Architecture overview: [docs/architecture_overview.md](architecture_overview.md) |
| 251 | +- AdminForward RPC contract: `proto/admin_forward.proto` |
0 commit comments