Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/admin_ui_key_visualizer_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,13 @@ Because writes are recorded by Raft leaders and follower-local reads are recorde
|---|---|---|
| 0 | `cmd/elastickv-admin` skeleton, token-protected `Admin` gRPC service stub, empty SPA shell, CI wiring. | Binary builds, `/api/cluster/overview` returns live data from a real node only when the configured admin token is supplied. |
| 1 | Overview, Routes, Raft Groups, Adapters pages. `LiveSummary` added. No sampler. | All read-only pages match `grpcurl` ground truth. |
| 2 | Key Visualizer MVP: in-memory sampler with adaptive sub-sampling, leader writes, leader/follower reads, fan-out across nodes, static matrix API with virtual-bucket metadata. | Benchmark gate green; heatmap shows synthetic hotspot within 2 s of load; ±5% / 95%-CI accuracy SLO holds under synthetic bursts; fan-out returns complete view with 1 node down. |
| 3 | Bytes series, drill-down, split/merge continuity, namespace-isolated persistence of compacted columns distributed **per owning Raft group**, lineage recovery, and retention GC. | Heatmap remains continuous across a live `SplitRange`; restart preserves last 7 days; expired data and stale lineage records are collected; no single Raft group sees more than its share of KeyViz writes. |
| 2-A | Key Visualizer MVP server side: in-memory sampler with adaptive sub-sampling, leader writes, leader/follower reads, static matrix API with virtual-bucket metadata. | Benchmark gate green; ±5% / 95%-CI accuracy SLO holds under synthetic bursts; matrix endpoint returns the local node's view. |
| 2-B | KeyViz SPA integration into `web/admin/`: heatmap page, series picker, row budget, manual + auto refresh. See `docs/design/2026_04_27_proposed_keyviz_spa_integration.md`. | Heatmap shows synthetic hotspot within ~5 s of `make client` driving traffic against `make run`; type check (`tsc -b --noEmit`) clean. |
| 2-C | Cluster fan-out: admin RPC that aggregates each node's local sampler view so the SPA shows a cluster-wide heatmap rather than the local node's slice. | Fan-out returns complete view with 1 node down; SPA renders aggregate within the §10 budget. |
| 3 | Drill-down, split/merge continuity, namespace-isolated persistence of compacted columns distributed **per owning Raft group**, lineage recovery, and retention GC. | Heatmap remains continuous across a live `SplitRange`; restart preserves last 7 days; expired data and stale lineage records are collected; no single Raft group sees more than its share of KeyViz writes. |
| 4 (deferred) | Mutating admin operations (`SplitRange` from UI), browser login, RBAC, and identity-provider integration. Out of scope for this design; a follow-up design will cover it. | — |

Phases 0–2 are the minimum operationally useful product; Phase 3 is the "ship-quality" target.
Phases 0–2 (A/B/C) together are the minimum operationally useful product; Phase 3 is the "ship-quality" target. As of 2026-04-27, Phase 2-A is shipped (PRs #639/#645/#646/#647/#651/#660/#661/#672), Phase 2-B lands with this proposal, and Phase 2-C is open. Bytes series, originally listed under Phase 3, was rolled forward into 2-A and is already on the wire.

## 13. Open Questions

Expand Down
259 changes: 259 additions & 0 deletions docs/design/2026_04_27_proposed_keyviz_spa_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
---
status: proposed
phase: 2-B
parent_design: docs/admin_ui_key_visualizer_design.md
date: 2026-04-27
---

# KeyViz SPA Integration (Phase 2-B)

## 1. Background

Phase 2 of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`)
landed the **server side** end-to-end:

- `keyviz.MemSampler` with COW route table, ring-buffer history, and
bytes counters (PR #639).
- `ShardedCoordinator` write- and read-path observation (PR #645,
PR #661).
- `adapter.AdminServer.GetKeyVizMatrix` gRPC RPC (PR #646).
- `internal/admin` HTTP handler at `/admin/api/v1/keyviz/matrix`
(PR #660 + PR #672 follow-up).
- `main.go` end-to-end wiring (PR #647 / PR #651).

The remaining piece is the **frontend** — the admin SPA at `web/admin/`
already serves Overview / DynamoDB / SQS / S3, but has no KeyViz page.
This doc proposes Phase 2-B: integrate the heatmap into the existing
SPA rather than building a separate dashboard.

## 2. Why integrate, not build separately

The original §3 of the parent design left open the question of where
the SPA lives. Inventory of what already exists:

- `web/admin/` is a Vite + React 18 + TypeScript + Tailwind SPA, built
into `internal/admin/dist` and embedded via `embed.go`.
- It is served from the same Go process as the API (`internal/admin`),
on the same admin listener, so there is **no second origin**.
- Auth is HttpOnly `admin_session` cookie + double-submit `admin_csrf`
cookie, applied uniformly by `apiFetch` in `src/api/client.ts`.
- The KeyViz HTTP handler is already mounted on the same `apiBase`
(`/admin/api/v1`) and the same authn / CSRF middleware stack.
- Layout + nav (`src/components/Layout.tsx`) is already a list-driven
pattern — adding a tab is one entry.

Building a second SPA would duplicate:

- the Vite / Tailwind / ESLint / tsconfig toolchain,
- auth, session, CSRF, and 401 redirect logic (`auth.tsx` + `useApi.ts`),
- the embed pipeline (`internal/admin/embed.go` + `dist` glob),
- the cookie origin (a separate origin would force CORS or a reverse
proxy hack just to read `admin_session`).

Net cost of integration is **three new files plus three line edits**.
Net cost of a parallel SPA is on the order of weeks of toolchain and
auth re-plumbing, with no upside the user would observe.

**Decision: integrate into `web/admin/`.**

## 3. Surface area

### 3.1 New page

`web/admin/src/pages/KeyViz.tsx` mounted at route `/keyviz`. The page
contains:

- A header with the series picker (`writes` / `reads` / `write_bytes` /
`read_bytes`), a row-budget input (default 1024, capped server-side),
a refresh button, and a small "auto-refresh: off / 5 s / 30 s" toggle.
- The heatmap canvas itself: `<canvas>` rendered from the `Values[][]`
matrix the API returns. Rows on the Y axis are routes (one per
`KeyVizRow`), columns on the X axis are time bins from
`ColumnUnixMs`. Cell colour intensity is normalised against the
per-matrix max so a quiet column does not look identical to a hot
one.
- A row-detail flyout: hovering over a row reveals `bucket_id`,
`start`, `end`, `aggregate`, `route_count`, and (when present)
`route_ids` with a `route_ids_truncated` indicator.

The page is read-only and does not need the `full` role; both
`read_only` and `full` sessions can view it.

### 3.2 API client

Three additions to `web/admin/src/api/client.ts`:

```ts
export type KeyVizSeries = "reads" | "writes" | "read_bytes" | "write_bytes";

export interface KeyVizRow {
bucket_id: string;
start: string; // base64 from Go []byte
end: string;
aggregate: boolean;
route_ids?: number[];
route_ids_truncated?: boolean;
route_count: number;
values: number[];
}

export interface KeyVizMatrix {
column_unix_ms: number[];
rows: KeyVizRow[];
series: KeyVizSeries;
generated_at: string;
}

export interface KeyVizParams {
series?: KeyVizSeries;
from_unix_ms?: number;
to_unix_ms?: number;
rows?: number;
}

api.keyVizMatrix = (params, signal) =>
apiFetch<KeyVizMatrix>("/keyviz/matrix", { query: params, signal });
```

The query passes through `apiFetch`'s existing CSRF-free GET path; no
mutation route is needed for Phase 2-B.

### 3.3 Routing and navigation

- `web/admin/src/App.tsx`: add `<Route path="keyviz" element={<KeyVizPage />} />`
alongside the existing dynamo / sqs / s3 routes.
- `web/admin/src/components/Layout.tsx`: add `{ to: "/keyviz", label: "Key Visualizer" }`
to `navItems`.

### 3.4 What this proposal does NOT do

- **No charting library.** Pure `<canvas>` + a fixed colour ramp. The
full matrix is at most 1024 rows × a few hundred columns; that fits
trivially on a single canvas without virtualisation. If we later
want zoom/pan, we'll revisit the dependency cost in a follow-up.
- **No auto-correlation with Routes / Raft Groups pages.** Those
pages are not yet built; correlation is a Phase 1 task and will be
added when those pages land.
- **No drill-down view.** Phase 3 territory (per-route sparkline +
hot-key preview labels). Out of scope.
- **No multi-node fan-out.** The handler is currently node-local (it
only sees the local sampler). A separate Phase 2-A item will add a
fan-out admin RPC; this proposal renders whatever the handler
returns, and will pick up fan-out for free once that ships.

## 4. Heatmap rendering specifics

### 4.1 Colour mapping

Per design §4.1, the default series is `writes`. Cell value `v` is
normalised against the per-matrix max `M` (`v / M`, clamped to `[0,1]`)
and mapped through a perceptually-monotonic ramp. We will use a
hand-rolled 5-stop ramp (transparent → blue → green → yellow → red)
to avoid pulling in `d3-interpolate`. The ramp is in `lib/colorRamp.ts`
so a future swap is one file.

Empty cells (`v === 0`) render as the page background, not a faint blue
— this is critical for spotting actually-cold routes.

### 4.2 Layout

Cell width: `min(8 px, container_width / column_count)`. Cell height:
`min(4 px, container_height / row_count)`. Cap row count at 1024 so
the canvas height stays under ~4096 px even at the maximum budget.

Time axis labels are formatted as `HH:mm:ss` from `column_unix_ms[i]`.
The stride between rendered ticks is `max(ceil(column_count / 10),
ceil(56 px / cellW))` so adjacent labels never overlap at small cell
widths — at `cellW = 2 px` a naive every-tenth stride would pack
~54 px of monospace label into 2 px of horizontal space.

No inline labels are drawn on the route (Y) axis. At `cellH = 2 px`
text would not fit, and at `cellH = 4 px` it would crowd into the
heatmap. Instead, hovering over a row reveals the full `bucket_id`,
key range, route count, and route IDs in a row-detail flyout below
the canvas — the flyout supersedes the inline label idea.

### 4.3 Performance budget

Phase 2 §10 sets ≤120 ms render budget for a 1024×500 matrix. We
issue one `ctx.fillRect` per non-zero cell — the colour ramp runs
once per cell rather than per pixel, and zero-value cells short-circuit
so the only work on a quiet matrix is the initial `clearRect`. We do
**not** use SVG (one element per cell would be 500k DOM nodes at the
max), and we do **not** build an `ImageData` buffer (a single
`putImageData` would force per-pixel iteration over the larger axis,
which is the opposite of what we want for a sparse matrix).

### 4.4 Refresh

Auto-refresh polls `api.keyVizMatrix({ series, rows })` and re-renders.
The poll uses the same `useApiQuery` reload mechanism the other pages
use, so 401 → forced logout falls out for free.

5-second cadence is the lower bound; the sampler's flush is 1 s, so
polling faster would mostly redraw the same matrix. 30-second cadence
is for users leaving the tab open.

## 5. Testing

Phase 2-B is a pure-frontend change. The Go test suite is unchanged.

- **Manual verification** (recorded in the PR description):
1. `cd web/admin && npm install && npm run build` produces
`internal/admin/dist/index.html` containing the new bundle.
2. `make run` starts the demo cluster; opening
`http://127.0.0.1:8080/admin/` and navigating to **Key Visualizer**
renders the heatmap.
3. With no traffic, the heatmap shows the route grid in the
background colour (no false-colour blue).
4. With `make client` driving writes, hot routes light up red within
~5 s.
5. The series picker switches the displayed counter; row-budget
input clamps server-side at 1024.

- **Type check**: `npm run lint` (which is `tsc -b --noEmit`) is the
CI gate for the SPA.

- **Lint and unit tests for backend**: unchanged from existing CI
(`make lint`, `go test ./...`). No backend code changes in this
proposal.

## 6. Five-lens review checklist

Per `CLAUDE.md`, recorded for completeness even on a frontend change:

1. **Data loss** — n/a; SPA is read-only against an existing handler.
2. **Concurrency / distributed failures** — n/a; a single browser tab
polls a single handler instance. The handler itself is already
tested for concurrent observers.
3. **Performance** — Phase 2 §10 budget honoured by canvas +
`fillRect` per non-zero cell (see §4.3 for why we deliberately
avoid `putImageData`). No new dependency. Polling defaults to off.
4. **Data consistency** — The SPA renders whatever the handler
returns; consistency guarantees come from the existing sampler
(in-memory, leader-issued counters per Phase 2 design §5.1).
5. **Test coverage** — Type-check via `tsc -b --noEmit`. Manual
verification steps documented in §5; KeyViz is the kind of feature
where a screenshot or video in the PR description is more useful
than a unit test.

## 7. Lifecycle

- Land this doc and the implementation in the same PR (doc commit
first, then implementation).
- On merge: rename `docs/admin_ui_key_visualizer_design.md`'s phase
table from "Phase 2 KeyViz MVP" to mark 2-B (SPA) as shipped, and
rename this doc from `*_proposed_*` to `*_implemented_*` once the
parent design's Phase 2 fan-out item also ships.

## 8. Open questions

1. Should the row-budget input be free-form (any integer ≤ 1024) or
stepped (256 / 512 / 1024)? Proposing free-form for ergonomics; the
server clamps anyway.
2. Should the page remember series + rows + auto-refresh in
`localStorage`? Probably yes, but punt to a follow-up — the URL
query can carry the same state for now if needed.
3. Should we colour-blind-safe the ramp by default (e.g., viridis)?
Worth doing eventually; for Phase 2-B the operator audience is
small enough that a follow-up swap is acceptable.
2 changes: 2 additions & 0 deletions web/admin/src/App.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import { RequireAuth } from "./components/RequireAuth";
import { DashboardPage } from "./pages/Dashboard";
import { DynamoDetailPage } from "./pages/DynamoDetail";
import { DynamoListPage } from "./pages/DynamoList";
import { KeyVizPage } from "./pages/KeyViz";
import { LoginPage } from "./pages/Login";
import { NotFoundPage } from "./pages/NotFound";
import { S3DetailPage } from "./pages/S3Detail";
Expand All @@ -31,6 +32,7 @@ export function App() {
<Route path="sqs/:name" element={<SqsDetailPage />} />
<Route path="s3" element={<S3ListPage />} />
<Route path="s3/:name" element={<S3DetailPage />} />
<Route path="keyviz" element={<KeyVizPage />} />
<Route path="*" element={<NotFoundPage />} />
</Route>
</Routes>
Expand Down
41 changes: 41 additions & 0 deletions web/admin/src/api/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,37 @@ export interface SqsQueueList {
queues: string[];
}

// KeyViz wire shapes mirror internal/admin/keyviz_handler.go
// (KeyVizMatrix / KeyVizRow). Go []byte fields arrive as
// base64-encoded strings via encoding/json — keep them as `string` on
// the client and decode lazily where preview labels need raw bytes.
export type KeyVizSeries = "reads" | "writes" | "read_bytes" | "write_bytes";

export interface KeyVizRow {
bucket_id: string;
start: string;
end: string;
aggregate: boolean;
route_ids?: number[];
route_ids_truncated?: boolean;
route_count: number;
values: number[];
}

export interface KeyVizMatrix {
column_unix_ms: number[];
rows: KeyVizRow[];
series: KeyVizSeries;
generated_at: string;
}

export interface KeyVizParams {
series?: KeyVizSeries;
from_unix_ms?: number;
to_unix_ms?: number;
rows?: number;
}

export const api = {
login: (access_key: string, secret_key: string) =>
apiFetch<LoginResponse>("/auth/login", {
Expand Down Expand Up @@ -252,4 +283,14 @@ export const api = {
apiFetch<SqsQueueSummary>(`/sqs/queues/${encodeURIComponent(name)}`, { signal }),
deleteQueue: (name: string) =>
apiFetch<void>(`/sqs/queues/${encodeURIComponent(name)}`, { method: "DELETE" }),
keyVizMatrix: (params: KeyVizParams, signal?: AbortSignal) =>
apiFetch<KeyVizMatrix>("/keyviz/matrix", {
query: {
series: params.series,
from_unix_ms: params.from_unix_ms,
to_unix_ms: params.to_unix_ms,
rows: params.rows,
},
signal,
}),
};
1 change: 1 addition & 0 deletions web/admin/src/components/Layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ const navItems: { to: string; label: string; end?: boolean }[] = [
{ to: "/dynamo", label: "DynamoDB" },
{ to: "/sqs", label: "SQS" },
{ to: "/s3", label: "S3" },
{ to: "/keyviz", label: "Key Visualizer" },
];

export function Layout() {
Expand Down
Loading
Loading