feat(kv): wire keyviz.Sampler into ShardedCoordinator dispatch path#645
feat(kv): wire keyviz.Sampler into ShardedCoordinator dispatch path#645
Conversation
Wires the keyviz.Sampler from PR #639 into ShardedCoordinator (design doc §5.1): - Adds an unexported `sampler keyviz.Sampler` field plus `WithSampler(s keyviz.Sampler) *ShardedCoordinator` post- construction option, mirroring WithLeaseReadObserver. - groupMutations now calls `c.observeMutation(routeID, mut)` once per resolved (RouteID, mutation) pair before grouping by GroupID. Reads are not dispatched through this path; all calls use OpWrite. - observeMutation guards against an interface-nil sampler at the call site; the keyviz contract also tolerates a typed-nil *MemSampler, so disabled keyviz wires through to a no-op without branching deep in the hot path. - DelPrefix is intentionally not observed: dispatchDelPrefixBroadcast doesn't resolve a single RouteID (it broadcasts to every shard), and the design treats per-route attribution there as out of scope. Tests: - TestShardedCoordinatorObservesEveryDispatchedMutation: cross-shard Put batch, verifies one Observe per element with the engine's resolved RouteID, OpWrite, and exact key/value lengths. - TestShardedCoordinatorWithoutSamplerStaysSafe: dispatches successfully both with no WithSampler call (interface-nil) and with a typed-nil *MemSampler.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 24 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds optional keyviz sampling to ShardedCoordinator: a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
rect rgba(220,240,255,0.5)
participant Client
participant Coordinator as ShardedCoordinator
participant Dist as Distribution/Router
participant Sampler as keyviz.Sampler
participant Shard as ShardGroup
end
Client->>Coordinator: Dispatch(mutations)
Coordinator->>Dist: groupMutations -> resolve RouteID per mutation
Dist-->>Coordinator: routed groups (RouteID, elems)
loop per routed mutation
Coordinator->>Sampler: observeMutation(RouteID, OpWrite, keyLen, valueLen)
alt sampler is nil
Sampler--xCoordinator: no-op (early return)
else sampler present
Sampler-->>Coordinator: Observe recorded
end
end
Coordinator->>Shard: send grouped dispatches to target shards
Shard-->>Coordinator: responses
Coordinator-->>Client: aggregated response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
kv/sharded_coordinator_sampler_test.go (1)
103-142: Optional: tighten the nil-safety contract assertion.Both subtests only assert
require.NoError. The PR description states that disabled samplers produce zero observations; the interface-nil case can't be checked without a recorder, but you could at least assertrequire.Nil(t, coord.sampler)(or a getter) for the "no WithSampler call" subtest to pin that the field stays unset rather than getting silently initialized somewhere. Minor — the panic-free assertion already covers the main contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kv/sharded_coordinator_sampler_test.go` around lines 103 - 142, The test TestShardedCoordinatorWithoutSamplerStaysSafe should assert the sampler field stays unset for the "no WithSampler call" case; after constructing coord via tc.opt(NewShardedCoordinator(...)) add an assertion that coord.sampler is nil (or call an exported getter if sampler is unexported) to ensure the coordinator was not silently initialized with a sampler; leave the typed-nil subtest unchanged because it cannot be checked without a recorder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@kv/sharded_coordinator_sampler_test.go`:
- Around line 103-142: The test TestShardedCoordinatorWithoutSamplerStaysSafe
should assert the sampler field stays unset for the "no WithSampler call" case;
after constructing coord via tc.opt(NewShardedCoordinator(...)) add an assertion
that coord.sampler is nil (or call an exported getter if sampler is unexported)
to ensure the coordinator was not silently initialized with a sampler; leave the
typed-nil subtest unchanged because it cannot be checked without a recorder.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f93f4220-e034-4bf1-b611-502700acc929
📒 Files selected for processing (2)
kv/sharded_coordinator.gokv/sharded_coordinator_sampler_test.go
|
@claude review |
|
Claude finished @bootjp's task in 2m 15s —— View job Code Review
Overall this is clean, focused, and well-scoped. The hot-path properties are preserved, and the test coverage is solid. A few issues to address: Issue 1 —
|
Round-1 review fixes for PR #645: - Claude bot Issue 1: WithSampler stores the interface value as supplied; the guard at the call site only checks interface-nil, not typed-nil. Document the nil-receiver requirement on the Sampler interface so a future implementor cannot land a panic-on-nil Observe and silently break dispatch. - Claude bot Issue 2: trim observeMutation comment to the two non-obvious points (reads bypass this path; the early return keeps the disabled hot path branch-only). Per CLAUDE.md, comments for the "what" duplicate the code. - Claude / CodeRabbit nit: TestShardedCoordinatorWithoutSamplerStaysSafe now asserts c.sampler stays the zero interface value for the "no WithSampler call" subcase, so a future refactor that silently initialises the field would fail the test.
|
@claude review |
|
Claude finished @bootjp's task in 1m 43s —— View job Code Review
This is a clean, well-scoped change. The issues from the previous review have all been resolved:
Five-pass findingsPass 1 — Data loss: No issues. Pass 2 — Concurrency / distributed failures: No issues. Pass 3 — Performance: No issues. The nil check ( Pass 4 — Data consistency: No issues. Pass 5 — Test coverage: Mostly solid, with two minor gaps:
Neither gap is a blocker. The cross-shard assertion in Design note — Production route-registration dependency
Overall: approve. The sampling hot path is correct, nil-safe, and allocation-free when disabled. The two test coverage gaps are minor and don't block merging. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
kv/sharded_coordinator.go (1)
978-985: LGTM — clean, nil-safe single call site.
observeMutationcorrectly guards the interface-nil case, and typed-nil is delegated to the implementation per thekeyviz.Samplercontract (see keyviz/sampler.go:68-75 andMemSampler.Observelines 324-327). Placing the call before grouping byGroupIDmakes the per-mutation attribution unambiguous.One thing worth being explicit about (in a follow-up doc tweak, not a blocker): observation fires before the Raft commit, so failed dispatches are still counted in the heatmap. That's almost certainly what you want for a workload heatmap, but it's worth pinning in the comment so a future contributor doesn't "fix" it by moving the call after
router.Commit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kv/sharded_coordinator.go` around lines 978 - 985, Update the comment above the observeMutation method in ShardedCoordinator to explicitly state that sampler.Observe is called before the Raft commit/router.Commit and therefore observations will include failed/aborted dispatches (they are counted in the heatmap); reference the observeMutation function and the call to c.sampler.Observe so future readers understand the intentional ordering and its implications for workload attribution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@kv/sharded_coordinator_sampler_test.go`:
- Around line 87-96: The test currently calls engine.GetRoute(elem.Key) but
production routes via engine.GetRoute(routeKey(...)), so update the assertion to
resolve the route using the same transform: call
engine.GetRoute(routeKey(elem.Key)) and use the returned route.RouteID in the
sampleCall comparison (keeping sampleCall, ops.Elems, and calls[i] as before) to
ensure the test mirrors production routing behavior.
---
Nitpick comments:
In `@kv/sharded_coordinator.go`:
- Around line 978-985: Update the comment above the observeMutation method in
ShardedCoordinator to explicitly state that sampler.Observe is called before the
Raft commit/router.Commit and therefore observations will include failed/aborted
dispatches (they are counted in the heatmap); reference the observeMutation
function and the call to c.sampler.Observe so future readers understand the
intentional ordering and its implications for workload attribution.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5156ae81-8500-4932-8e02-54f15984b10f
📒 Files selected for processing (3)
keyviz/sampler.gokv/sharded_coordinator.gokv/sharded_coordinator_sampler_test.go
✅ Files skipped from review due to trivial changes (1)
- keyviz/sampler.go
Round-2 review fixes for PR #645: CodeRabbit nit on observeMutation: spell out the pre-commit ordering intent so a future contributor doesn't "fix" it by moving the call after router.Commit. The heatmap is meant to reflect offered load (useful for planning capacity); counting only committed writes would hide hotspots that show up as repeated proposal failures. CodeRabbit nit on TestShardedCoordinatorObservesEveryDispatchedMutation: production routes via engine.GetRoute(routeKey(mut.Key)) but the test was calling engine.GetRoute(elem.Key) directly. Both inputs ("b", "x") happen to be identity through normalizeRouteKey today, but a future change to internal-prefix handling could silently diverge the test from production. Mirror the production transform.
|
@claude review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Claude finished @bootjp's task in 1m 53s —— View job Code Review
All issues from the two previous review rounds have been addressed in commit What was fixed since the last review
Five-pass findingsPass 1 — Data loss: No issues. Pass 2 — Concurrency: No issues. Pass 3 — Performance: No issues. The Pass 4 — Data consistency: No issues. The RouteID fed to the sampler comes directly from Pass 5 — Test coverage: Mostly solid. Two minor gaps remain (unchanged from the second review), neither is a blocker:
Design note — Production route registration
Positive notes
Verdict: approve. The implementation is correct, nil-safe, allocation-free on the disabled path, and the interface contract is clearly documented. |
…nator + admin) (#647) ## Summary End-to-end keyviz wiring on each node binary. Stacked on top of #645 (coordinator `WithSampler`) and #646 (admin server `RegisterSampler`) — those two PRs are merged into this branch via merge commits, so this PR's incremental diff is `main.go` + `main_keyviz_test.go`. - Three new flags: `--keyvizEnabled` (off by default — opt-in), `--keyvizStep`, `--keyvizMaxTrackedRoutes`, `--keyvizMaxMemberRoutesPerSlot`. Defaults pull from `keyviz.Default*` so flag help and code stay in sync. - `buildKeyVizSampler()` returns `*keyviz.MemSampler` when enabled, `nil` otherwise — the single decision point for "keyviz on/off." - `seedKeyVizRoutes()` copies the engine's startup route catalogue into the sampler. Route-watch propagation post-startup is a follow-up (Phase 3 in the design doc). - `startKeyVizFlusher()` runs `RunFlusher` in the existing errgroup and calls one final `Flush()` after ctx fires so the in-progress step is harvested at graceful shutdown. - Coordinator: chained `.WithSampler(...)` onto the existing `.WithLeaseReadObserver(...)` call so the dispatch hot path observes mutations. - AdminServer: `setupAdminService` now takes the `*MemSampler` and only calls `RegisterSampler` when it's non-nil — operators with keyviz disabled get `codes.Unavailable` on `GetKeyVizMatrix` instead of a spurious empty success. Implements `docs/admin_ui_key_visualizer_design.md` §5 / §10. After this lands the heatmap is fully usable end-to-end. ## Test plan - [x] `TestBuildKeyVizSamplerHonorsEnabledFlag` — flag on/off contract. - [x] `TestSeedKeyVizRoutesCopiesEngineCatalogue` — `engine.Stats()` seed path. - [x] `TestSeedKeyVizRoutesNoOpOnNilSampler` — disabled-sampler safety. - [x] `TestStartKeyVizFlusherReturnsAfterCancel` — graceful shutdown drains the pre-cancel counters via the final `Flush`. - [x] `go build .`, `go vet .`, `golangci-lint run ./...` clean. - [x] `go test -race -count=1 -run 'TestBuildKeyVizSampler|TestSeedKeyVizRoutes|TestStartKeyVizFlusher' .` passes. ## Merge order 1. Merge #645 (coordinator wiring). 2. Merge #646 (admin server wiring). 3. Rebase this branch on `main` (the merge commits in this PR collapse to no-ops) and merge.
…646) ## Summary - Implements `AdminServer.GetKeyVizMatrix` against the proto declared in `proto/admin.proto` (already generated; no proto edits). - New narrow `KeyVizSampler` interface in the adapter package (just `Snapshot(from, to time.Time) []keyviz.MatrixColumn`), so production wires `*keyviz.MemSampler` while tests pass an in-memory fake. - `AdminServer.RegisterSampler` mirrors `RegisterGroup`. Without it, `GetKeyVizMatrix` returns `codes.Unavailable` so callers can distinguish "keyviz disabled on this node" from "no data yet" (which is a successful empty response). - Pivots the column-major `MatrixColumn` slice into the row-major proto layout: one `KeyVizRow` per `RouteID` with values aligned to a parallel `column_unix_ms` slice. `KeyVizSeries` selection picks the matching per-row counter; `UNSPECIFIED` defaults to `Reads`. - `bucket_id` encodes `route:<id>` for individual slots and `virtual:<syntheticID>` for aggregate buckets. Aggregate rows carry `MemberRoutes` verbatim through `route_ids` and `route_count`. Implements the read-side half of `docs/admin_ui_key_visualizer_design.md` §5.2 / §6. The dispatch-side `Observe` wiring is in #645. ## Test plan - [x] `TestGetKeyVizMatrixReturnsUnavailableWhenSamplerNotRegistered` — verifies `codes.Unavailable` when no sampler registered. - [x] `TestGetKeyVizMatrixPivotsColumnsToRows` — two-column / two-route fixture, verifies the missing-row-becomes-zero contract. - [x] `TestGetKeyVizMatrixSeriesSelection` — table-driven across all five enum values including `UNSPECIFIED` defaulting to `Reads`. - [x] `TestGetKeyVizMatrixEncodesAggregateBucket` — virtual bucket layout (`bucket_id` prefix, `aggregate=true`, `route_ids`, `route_count`). - [x] `go test -race -count=1 -run TestGetKeyVizMatrix ./adapter/...` clean. - [x] `golangci-lint run ./adapter/...` clean.
## Summary Phase 2-B of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into the existing `web/admin/` SPA rather than building a parallel dashboard. Doc + implementation in one PR (doc commit first, per CLAUDE.md design-doc-first workflow). - New page at `/keyviz` — canvas heatmap polling `/admin/api/v1/keyviz/matrix` with series picker (writes / reads / write_bytes / read_bytes), row-budget input (clamped at 1024), and off / 5 s / 30 s auto-refresh. - Cold cells (value 0) render as the page background, not a faint blue — spotting actually-cold routes stays the dominant visual signal. - Row-detail flyout on hover: bucket_id, start, end, aggregate, route_count, route_ids (with truncation indicator). `start` / `end` decode through a printable-or-hex preview so binary keys do not render as mojibake. - Dependency-free: hand-rolled five-stop colour ramp in `lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208 kB (raw), 64 kB gzipped. Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 / #646 / #647 / #651 / #660 / #661 / #672). ## Design `docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed status, lands with this PR. Parent design §12 phase table is split into 2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out, open). ## Five-lens self-review 1. **Data loss** — n/a; SPA is read-only against an existing handler. 2. **Concurrency / distributed** — n/a; single browser tab polling a single handler. Sampler concurrency was already covered by Phase 2-A tests. 3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower bound (sampler flush is 1 s). 4. **Data consistency** — SPA renders whatever the handler returns; consistency guarantees come from the existing leader-issued counters in the sampler. 5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go build ./internal/admin/...` clean (embed glob unaffected); `go test ./internal/admin/...` clean. Manual verification documented in the design doc §5. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output goes to `internal/admin/dist` - [x] `go build ./internal/admin/...` — clean - [x] `go test ./internal/admin/...` — clean - [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot routes light up red within ~5 s of write traffic - [ ] Manual: series picker swaps the displayed counter; row-budget input clamps at 1024; auto-refresh polls without flicker ## Out of scope - **Cluster fan-out** — handler is currently node-local. Phase 2-C will add a cross-node admin RPC; this PR will pick up the aggregate view automatically once that ships. - **Drill-down per-route sparkline** — Phase 3. - **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet built; correlation lands when those pages do. - **`localStorage` for series / rows / refresh** — punt to follow-up.
Summary
WithSampler(s keyviz.Sampler) *ShardedCoordinatoroption (mirroringWithLeaseReadObserver) plus a singleobserveMutationcall insidegroupMutations.(RouteID, mutation)pair produces onesampler.Observe(routeID, OpWrite, len(Key), len(Value))call before the mutation is grouped byGroupID. Reads do not reach this path; transactional dispatch reusesgroupMutationsso it gets wired automatically.DelPrefixis intentionally not observed —dispatchDelPrefixBroadcastbroadcasts to every shard rather than resolving a singleRouteID, so per-route attribution is out of scope for this slice.c.sampleris guarded at the call site; thekeyviz.MemSamplercontract also tolerates a typed-nil receiver, so a disabled sampler costs one branch off the hot path.Implements task §5.1 from
docs/admin_ui_key_visualizer_design.md(split out of the keyviz design originally landed in #639).Test plan
TestShardedCoordinatorObservesEveryDispatchedMutation— cross-shard Put batch, verifies one Observe per element with engine-resolved RouteID, OpWrite, and exact keyLen / valueLen.TestShardedCoordinatorWithoutSamplerStaysSafe— dispatches succeed with noWithSamplercall (interface-nil) and with a typed-nil*MemSampler.go test -race -count=1 ./kv/... ./keyviz/...clean.golangci-lint run ./kv/...clean.Summary by CodeRabbit
New Features
Tests