Skip to content

Commit 98d38b5

Browse files
authored
feat(admin): KeyViz heatmap page in web/admin SPA (Phase 2-B) (#680)
## Summary Phase 2-B of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into the existing `web/admin/` SPA rather than building a parallel dashboard. Doc + implementation in one PR (doc commit first, per CLAUDE.md design-doc-first workflow). - New page at `/keyviz` — canvas heatmap polling `/admin/api/v1/keyviz/matrix` with series picker (writes / reads / write_bytes / read_bytes), row-budget input (clamped at 1024), and off / 5 s / 30 s auto-refresh. - Cold cells (value 0) render as the page background, not a faint blue — spotting actually-cold routes stays the dominant visual signal. - Row-detail flyout on hover: bucket_id, start, end, aggregate, route_count, route_ids (with truncation indicator). `start` / `end` decode through a printable-or-hex preview so binary keys do not render as mojibake. - Dependency-free: hand-rolled five-stop colour ramp in `lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208 kB (raw), 64 kB gzipped. Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 / #646 / #647 / #651 / #660 / #661 / #672). ## Design `docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed status, lands with this PR. Parent design §12 phase table is split into 2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out, open). ## Five-lens self-review 1. **Data loss** — n/a; SPA is read-only against an existing handler. 2. **Concurrency / distributed** — n/a; single browser tab polling a single handler. Sampler concurrency was already covered by Phase 2-A tests. 3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower bound (sampler flush is 1 s). 4. **Data consistency** — SPA renders whatever the handler returns; consistency guarantees come from the existing leader-issued counters in the sampler. 5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go build ./internal/admin/...` clean (embed glob unaffected); `go test ./internal/admin/...` clean. Manual verification documented in the design doc §5. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output goes to `internal/admin/dist` - [x] `go build ./internal/admin/...` — clean - [x] `go test ./internal/admin/...` — clean - [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot routes light up red within ~5 s of write traffic - [ ] Manual: series picker swaps the displayed counter; row-budget input clamps at 1024; auto-refresh polls without flicker ## Out of scope - **Cluster fan-out** — handler is currently node-local. Phase 2-C will add a cross-node admin RPC; this PR will pick up the aggregate view automatically once that ships. - **Drill-down per-route sparkline** — Phase 3. - **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet built; correlation lands when those pages do. - **`localStorage` for series / rows / refresh** — punt to follow-up.
2 parents 8c38ada + a606df1 commit 98d38b5

7 files changed

Lines changed: 763 additions & 3 deletions

File tree

docs/admin_ui_key_visualizer_design.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -319,11 +319,13 @@ Because writes are recorded by Raft leaders and follower-local reads are recorde
319319
|---|---|---|
320320
| 0 | `cmd/elastickv-admin` skeleton, token-protected `Admin` gRPC service stub, empty SPA shell, CI wiring. | Binary builds, `/api/cluster/overview` returns live data from a real node only when the configured admin token is supplied. |
321321
| 1 | Overview, Routes, Raft Groups, Adapters pages. `LiveSummary` added. No sampler. | All read-only pages match `grpcurl` ground truth. |
322-
| 2 | Key Visualizer MVP: in-memory sampler with adaptive sub-sampling, leader writes, leader/follower reads, fan-out across nodes, static matrix API with virtual-bucket metadata. | Benchmark gate green; heatmap shows synthetic hotspot within 2 s of load; ±5% / 95%-CI accuracy SLO holds under synthetic bursts; fan-out returns complete view with 1 node down. |
323-
| 3 | Bytes series, drill-down, split/merge continuity, namespace-isolated persistence of compacted columns distributed **per owning Raft group**, lineage recovery, and retention GC. | Heatmap remains continuous across a live `SplitRange`; restart preserves last 7 days; expired data and stale lineage records are collected; no single Raft group sees more than its share of KeyViz writes. |
322+
| 2-A | Key Visualizer MVP server side: in-memory sampler with adaptive sub-sampling, leader writes, leader/follower reads, static matrix API with virtual-bucket metadata. | Benchmark gate green; ±5% / 95%-CI accuracy SLO holds under synthetic bursts; matrix endpoint returns the local node's view. |
323+
| 2-B | KeyViz SPA integration into `web/admin/`: heatmap page, series picker, row budget, manual + auto refresh. See `docs/design/2026_04_27_proposed_keyviz_spa_integration.md`. | Heatmap shows synthetic hotspot within ~5 s of `make client` driving traffic against `make run`; type check (`tsc -b --noEmit`) clean. |
324+
| 2-C | Cluster fan-out: admin RPC that aggregates each node's local sampler view so the SPA shows a cluster-wide heatmap rather than the local node's slice. | Fan-out returns complete view with 1 node down; SPA renders aggregate within the §10 budget. |
325+
| 3 | Drill-down, split/merge continuity, namespace-isolated persistence of compacted columns distributed **per owning Raft group**, lineage recovery, and retention GC. | Heatmap remains continuous across a live `SplitRange`; restart preserves last 7 days; expired data and stale lineage records are collected; no single Raft group sees more than its share of KeyViz writes. |
324326
| 4 (deferred) | Mutating admin operations (`SplitRange` from UI), browser login, RBAC, and identity-provider integration. Out of scope for this design; a follow-up design will cover it. ||
325327

326-
Phases 0–2 are the minimum operationally useful product; Phase 3 is the "ship-quality" target.
328+
Phases 0–2 (A/B/C) together are the minimum operationally useful product; Phase 3 is the "ship-quality" target. As of 2026-04-27, Phase 2-A is shipped (PRs #639/#645/#646/#647/#651/#660/#661/#672), Phase 2-B lands with this proposal, and Phase 2-C is open. Bytes series, originally listed under Phase 3, was rolled forward into 2-A and is already on the wire.
327329

328330
## 13. Open Questions
329331

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
status: proposed
3+
phase: 2-B
4+
parent_design: docs/admin_ui_key_visualizer_design.md
5+
date: 2026-04-27
6+
---
7+
8+
# KeyViz SPA Integration (Phase 2-B)
9+
10+
## 1. Background
11+
12+
Phase 2 of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`)
13+
landed the **server side** end-to-end:
14+
15+
- `keyviz.MemSampler` with COW route table, ring-buffer history, and
16+
bytes counters (PR #639).
17+
- `ShardedCoordinator` write- and read-path observation (PR #645,
18+
PR #661).
19+
- `adapter.AdminServer.GetKeyVizMatrix` gRPC RPC (PR #646).
20+
- `internal/admin` HTTP handler at `/admin/api/v1/keyviz/matrix`
21+
(PR #660 + PR #672 follow-up).
22+
- `main.go` end-to-end wiring (PR #647 / PR #651).
23+
24+
The remaining piece is the **frontend** — the admin SPA at `web/admin/`
25+
already serves Overview / DynamoDB / SQS / S3, but has no KeyViz page.
26+
This doc proposes Phase 2-B: integrate the heatmap into the existing
27+
SPA rather than building a separate dashboard.
28+
29+
## 2. Why integrate, not build separately
30+
31+
The original §3 of the parent design left open the question of where
32+
the SPA lives. Inventory of what already exists:
33+
34+
- `web/admin/` is a Vite + React 18 + TypeScript + Tailwind SPA, built
35+
into `internal/admin/dist` and embedded via `embed.go`.
36+
- It is served from the same Go process as the API (`internal/admin`),
37+
on the same admin listener, so there is **no second origin**.
38+
- Auth is HttpOnly `admin_session` cookie + double-submit `admin_csrf`
39+
cookie, applied uniformly by `apiFetch` in `src/api/client.ts`.
40+
- The KeyViz HTTP handler is already mounted on the same `apiBase`
41+
(`/admin/api/v1`) and the same authn / CSRF middleware stack.
42+
- Layout + nav (`src/components/Layout.tsx`) is already a list-driven
43+
pattern — adding a tab is one entry.
44+
45+
Building a second SPA would duplicate:
46+
47+
- the Vite / Tailwind / ESLint / tsconfig toolchain,
48+
- auth, session, CSRF, and 401 redirect logic (`auth.tsx` + `useApi.ts`),
49+
- the embed pipeline (`internal/admin/embed.go` + `dist` glob),
50+
- the cookie origin (a separate origin would force CORS or a reverse
51+
proxy hack just to read `admin_session`).
52+
53+
Net cost of integration is **three new files plus three line edits**.
54+
Net cost of a parallel SPA is on the order of weeks of toolchain and
55+
auth re-plumbing, with no upside the user would observe.
56+
57+
**Decision: integrate into `web/admin/`.**
58+
59+
## 3. Surface area
60+
61+
### 3.1 New page
62+
63+
`web/admin/src/pages/KeyViz.tsx` mounted at route `/keyviz`. The page
64+
contains:
65+
66+
- A header with the series picker (`writes` / `reads` / `write_bytes` /
67+
`read_bytes`), a row-budget input (default 1024, capped server-side),
68+
a refresh button, and a small "auto-refresh: off / 5 s / 30 s" toggle.
69+
- The heatmap canvas itself: `<canvas>` rendered from the `Values[][]`
70+
matrix the API returns. Rows on the Y axis are routes (one per
71+
`KeyVizRow`), columns on the X axis are time bins from
72+
`ColumnUnixMs`. Cell colour intensity is normalised against the
73+
per-matrix max so a quiet column does not look identical to a hot
74+
one.
75+
- A row-detail flyout: hovering over a row reveals `bucket_id`,
76+
`start`, `end`, `aggregate`, `route_count`, and (when present)
77+
`route_ids` with a `route_ids_truncated` indicator.
78+
79+
The page is read-only and does not need the `full` role; both
80+
`read_only` and `full` sessions can view it.
81+
82+
### 3.2 API client
83+
84+
Three additions to `web/admin/src/api/client.ts`:
85+
86+
```ts
87+
export type KeyVizSeries = "reads" | "writes" | "read_bytes" | "write_bytes";
88+
89+
export interface KeyVizRow {
90+
bucket_id: string;
91+
start: string; // base64 from Go []byte
92+
end: string;
93+
aggregate: boolean;
94+
route_ids?: number[];
95+
route_ids_truncated?: boolean;
96+
route_count: number;
97+
values: number[];
98+
}
99+
100+
export interface KeyVizMatrix {
101+
column_unix_ms: number[];
102+
rows: KeyVizRow[];
103+
series: KeyVizSeries;
104+
generated_at: string;
105+
}
106+
107+
export interface KeyVizParams {
108+
series?: KeyVizSeries;
109+
from_unix_ms?: number;
110+
to_unix_ms?: number;
111+
rows?: number;
112+
}
113+
114+
api.keyVizMatrix = (params, signal) =>
115+
apiFetch<KeyVizMatrix>("/keyviz/matrix", { query: params, signal });
116+
```
117+
118+
The query passes through `apiFetch`'s existing CSRF-free GET path; no
119+
mutation route is needed for Phase 2-B.
120+
121+
### 3.3 Routing and navigation
122+
123+
- `web/admin/src/App.tsx`: add `<Route path="keyviz" element={<KeyVizPage />} />`
124+
alongside the existing dynamo / sqs / s3 routes.
125+
- `web/admin/src/components/Layout.tsx`: add `{ to: "/keyviz", label: "Key Visualizer" }`
126+
to `navItems`.
127+
128+
### 3.4 What this proposal does NOT do
129+
130+
- **No charting library.** Pure `<canvas>` + a fixed colour ramp. The
131+
full matrix is at most 1024 rows × a few hundred columns; that fits
132+
trivially on a single canvas without virtualisation. If we later
133+
want zoom/pan, we'll revisit the dependency cost in a follow-up.
134+
- **No auto-correlation with Routes / Raft Groups pages.** Those
135+
pages are not yet built; correlation is a Phase 1 task and will be
136+
added when those pages land.
137+
- **No drill-down view.** Phase 3 territory (per-route sparkline +
138+
hot-key preview labels). Out of scope.
139+
- **No multi-node fan-out.** The handler is currently node-local (it
140+
only sees the local sampler). A separate Phase 2-A item will add a
141+
fan-out admin RPC; this proposal renders whatever the handler
142+
returns, and will pick up fan-out for free once that ships.
143+
144+
## 4. Heatmap rendering specifics
145+
146+
### 4.1 Colour mapping
147+
148+
Per design §4.1, the default series is `writes`. Cell value `v` is
149+
normalised against the per-matrix max `M` (`v / M`, clamped to `[0,1]`)
150+
and mapped through a perceptually-monotonic ramp. We will use a
151+
hand-rolled 5-stop ramp (transparent → blue → green → yellow → red)
152+
to avoid pulling in `d3-interpolate`. The ramp is in `lib/colorRamp.ts`
153+
so a future swap is one file.
154+
155+
Empty cells (`v === 0`) render as the page background, not a faint blue
156+
— this is critical for spotting actually-cold routes.
157+
158+
### 4.2 Layout
159+
160+
Cell width: `min(8 px, container_width / column_count)`. Cell height:
161+
`min(4 px, container_height / row_count)`. Cap row count at 1024 so
162+
the canvas height stays under ~4096 px even at the maximum budget.
163+
164+
Time axis labels are formatted as `HH:mm:ss` from `column_unix_ms[i]`.
165+
The stride between rendered ticks is `max(ceil(column_count / 10),
166+
ceil(56 px / cellW))` so adjacent labels never overlap at small cell
167+
widths — at `cellW = 2 px` a naive every-tenth stride would pack
168+
~54 px of monospace label into 2 px of horizontal space.
169+
170+
No inline labels are drawn on the route (Y) axis. At `cellH = 2 px`
171+
text would not fit, and at `cellH = 4 px` it would crowd into the
172+
heatmap. Instead, hovering over a row reveals the full `bucket_id`,
173+
key range, route count, and route IDs in a row-detail flyout below
174+
the canvas — the flyout supersedes the inline label idea.
175+
176+
### 4.3 Performance budget
177+
178+
Phase 2 §10 sets ≤120 ms render budget for a 1024×500 matrix. We
179+
issue one `ctx.fillRect` per non-zero cell — the colour ramp runs
180+
once per cell rather than per pixel, and zero-value cells short-circuit
181+
so the only work on a quiet matrix is the initial `clearRect`. We do
182+
**not** use SVG (one element per cell would be 500k DOM nodes at the
183+
max), and we do **not** build an `ImageData` buffer (a single
184+
`putImageData` would force per-pixel iteration over the larger axis,
185+
which is the opposite of what we want for a sparse matrix).
186+
187+
### 4.4 Refresh
188+
189+
Auto-refresh polls `api.keyVizMatrix({ series, rows })` and re-renders.
190+
The poll uses the same `useApiQuery` reload mechanism the other pages
191+
use, so 401 → forced logout falls out for free.
192+
193+
5-second cadence is the lower bound; the sampler's flush is 1 s, so
194+
polling faster would mostly redraw the same matrix. 30-second cadence
195+
is for users leaving the tab open.
196+
197+
## 5. Testing
198+
199+
Phase 2-B is a pure-frontend change. The Go test suite is unchanged.
200+
201+
- **Manual verification** (recorded in the PR description):
202+
1. `cd web/admin && npm install && npm run build` produces
203+
`internal/admin/dist/index.html` containing the new bundle.
204+
2. `make run` starts the demo cluster; opening
205+
`http://127.0.0.1:8080/admin/` and navigating to **Key Visualizer**
206+
renders the heatmap.
207+
3. With no traffic, the heatmap shows the route grid in the
208+
background colour (no false-colour blue).
209+
4. With `make client` driving writes, hot routes light up red within
210+
~5 s.
211+
5. The series picker switches the displayed counter; row-budget
212+
input clamps server-side at 1024.
213+
214+
- **Type check**: `npm run lint` (which is `tsc -b --noEmit`) is the
215+
CI gate for the SPA.
216+
217+
- **Lint and unit tests for backend**: unchanged from existing CI
218+
(`make lint`, `go test ./...`). No backend code changes in this
219+
proposal.
220+
221+
## 6. Five-lens review checklist
222+
223+
Per `CLAUDE.md`, recorded for completeness even on a frontend change:
224+
225+
1. **Data loss** — n/a; SPA is read-only against an existing handler.
226+
2. **Concurrency / distributed failures** — n/a; a single browser tab
227+
polls a single handler instance. The handler itself is already
228+
tested for concurrent observers.
229+
3. **Performance** — Phase 2 §10 budget honoured by canvas +
230+
`fillRect` per non-zero cell (see §4.3 for why we deliberately
231+
avoid `putImageData`). No new dependency. Polling defaults to off.
232+
4. **Data consistency** — The SPA renders whatever the handler
233+
returns; consistency guarantees come from the existing sampler
234+
(in-memory, leader-issued counters per Phase 2 design §5.1).
235+
5. **Test coverage** — Type-check via `tsc -b --noEmit`. Manual
236+
verification steps documented in §5; KeyViz is the kind of feature
237+
where a screenshot or video in the PR description is more useful
238+
than a unit test.
239+
240+
## 7. Lifecycle
241+
242+
- Land this doc and the implementation in the same PR (doc commit
243+
first, then implementation).
244+
- On merge: rename `docs/admin_ui_key_visualizer_design.md`'s phase
245+
table from "Phase 2 KeyViz MVP" to mark 2-B (SPA) as shipped, and
246+
rename this doc from `*_proposed_*` to `*_implemented_*` once the
247+
parent design's Phase 2 fan-out item also ships.
248+
249+
## 8. Open questions
250+
251+
1. Should the row-budget input be free-form (any integer ≤ 1024) or
252+
stepped (256 / 512 / 1024)? Proposing free-form for ergonomics; the
253+
server clamps anyway.
254+
2. Should the page remember series + rows + auto-refresh in
255+
`localStorage`? Probably yes, but punt to a follow-up — the URL
256+
query can carry the same state for now if needed.
257+
3. Should we colour-blind-safe the ramp by default (e.g., viridis)?
258+
Worth doing eventually; for Phase 2-B the operator audience is
259+
small enough that a follow-up swap is acceptable.

web/admin/src/App.tsx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import { RequireAuth } from "./components/RequireAuth";
55
import { DashboardPage } from "./pages/Dashboard";
66
import { DynamoDetailPage } from "./pages/DynamoDetail";
77
import { DynamoListPage } from "./pages/DynamoList";
8+
import { KeyVizPage } from "./pages/KeyViz";
89
import { LoginPage } from "./pages/Login";
910
import { NotFoundPage } from "./pages/NotFound";
1011
import { S3DetailPage } from "./pages/S3Detail";
@@ -31,6 +32,7 @@ export function App() {
3132
<Route path="sqs/:name" element={<SqsDetailPage />} />
3233
<Route path="s3" element={<S3ListPage />} />
3334
<Route path="s3/:name" element={<S3DetailPage />} />
35+
<Route path="keyviz" element={<KeyVizPage />} />
3436
<Route path="*" element={<NotFoundPage />} />
3537
</Route>
3638
</Routes>

web/admin/src/api/client.ts

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,37 @@ export interface SqsQueueList {
216216
queues: string[];
217217
}
218218

219+
// KeyViz wire shapes mirror internal/admin/keyviz_handler.go
220+
// (KeyVizMatrix / KeyVizRow). Go []byte fields arrive as
221+
// base64-encoded strings via encoding/json — keep them as `string` on
222+
// the client and decode lazily where preview labels need raw bytes.
223+
export type KeyVizSeries = "reads" | "writes" | "read_bytes" | "write_bytes";
224+
225+
export interface KeyVizRow {
226+
bucket_id: string;
227+
start: string;
228+
end: string;
229+
aggregate: boolean;
230+
route_ids?: number[];
231+
route_ids_truncated?: boolean;
232+
route_count: number;
233+
values: number[];
234+
}
235+
236+
export interface KeyVizMatrix {
237+
column_unix_ms: number[];
238+
rows: KeyVizRow[];
239+
series: KeyVizSeries;
240+
generated_at: string;
241+
}
242+
243+
export interface KeyVizParams {
244+
series?: KeyVizSeries;
245+
from_unix_ms?: number;
246+
to_unix_ms?: number;
247+
rows?: number;
248+
}
249+
219250
export const api = {
220251
login: (access_key: string, secret_key: string) =>
221252
apiFetch<LoginResponse>("/auth/login", {
@@ -252,4 +283,14 @@ export const api = {
252283
apiFetch<SqsQueueSummary>(`/sqs/queues/${encodeURIComponent(name)}`, { signal }),
253284
deleteQueue: (name: string) =>
254285
apiFetch<void>(`/sqs/queues/${encodeURIComponent(name)}`, { method: "DELETE" }),
286+
keyVizMatrix: (params: KeyVizParams, signal?: AbortSignal) =>
287+
apiFetch<KeyVizMatrix>("/keyviz/matrix", {
288+
query: {
289+
series: params.series,
290+
from_unix_ms: params.from_unix_ms,
291+
to_unix_ms: params.to_unix_ms,
292+
rows: params.rows,
293+
},
294+
signal,
295+
}),
255296
};

web/admin/src/components/Layout.tsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ const navItems: { to: string; label: string; end?: boolean }[] = [
66
{ to: "/dynamo", label: "DynamoDB" },
77
{ to: "/sqs", label: "SQS" },
88
{ to: "/s3", label: "S3" },
9+
{ to: "/keyviz", label: "Key Visualizer" },
910
];
1011

1112
export function Layout() {

0 commit comments

Comments
 (0)