You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
- Add `GroupID` field to `routeSlot` and `RaftGroupID + LeaderTerm` to
`MatrixRow` (Phase 2-C+ §9.1 dedupe key)
- Add `MemSampler.SetLeaderTerm(groupID, term)` API and `groupTerms`
snapshot at Flush time
- Extend `RegisterRoute` with a `groupID uint64` parameter; update all
call sites (`main.go` reads `r.GroupID` from
`distribution.Engine.Stats()`)
## Why
Per `docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md` §9.1, the
cluster fan-out aggregator dedupes write samples by `(routeID,
raftGroupID, leaderTerm, columnAt)` instead of the conservative
max-merge. Max-merge undercounts during a leadership flip when the new
leader observes a window the old leader was halfway through. Carrying
`RaftGroupID + LeaderTerm` on every row gives the merge enough
information to dedupe per-term and sum across terms.
## Scope (PR-3a — sampler API only)
This PR ships the sampler-side API. Wiring lands in follow-up PRs:
- **PR-3a (this)**: `routeSlot.GroupID`, `MatrixRow.RaftGroupID +
LeaderTerm`, `SetLeaderTerm` API, `RegisterRoute` signature, `Flush`
term snapshot.
- **PR-3b (follow-up)**: periodic ticker in `main.go` that polls engine
Status and calls `SetLeaderTerm`; proto + JSON wire-format additions.
- **PR-3c (follow-up)**: aggregator-side `(group, term)`-keyed dedupe in
`internal/admin/keyviz_fanout.go`.
With `SetLeaderTerm` never called (this PR alone), every row emits
`LeaderTerm=0` and the fan-out merge falls back to today's max-merge →
no behavior change for legacy deployments.
## Test plan
- [x] `go test -race -count=1 ./keyviz/...` — passes
- [x] `go test -race -count=1 ./internal/admin/...` — passes (uses
MemSampler)
- [x] `go test -race -count=1 .` (root, `main_keyviz_test.go`) — passes
- [x] `go build ./...` — clean
- [x] `golangci-lint run` — 0 issues
- [ ] Jepsen — N/A (sampler-internal change, no replication/MVCC impact)
## Five-lens self-review
1. **Data loss** — no on-disk format change; legacy max-merge fallback
preserves today's behavior when `SetLeaderTerm` is never called.
2. **Concurrency** — `groupTermsMu` is a fine-grained `RWMutex`;
`Observe` never touches it; `snapshotGroupTerms` clones at the top of
`Flush` so every row in a column observes a stable view, even if
`SetLeaderTerm` fires concurrently.
3. **Performance** — `Observe` hot path is unchanged (no new map lookup,
no new lock); `Flush` gains one `RLock` + map clone per column, bounded
by `len(groupTerms)` (typically ≤ a few groups).
4. **Data consistency** — `(routeID, raftGroupID, leaderTerm)` is a
strict superset of today's `routeID` dedupe key; rows with
`LeaderTerm=0` collapse to the legacy max-merge.
5. **Test coverage** — new `SetLeaderTerm` publishing + `Flush` snapshot
tests under `keyviz/sampler_test.go`; existing `RegisterRoute` test
signatures updated; race detector clean.
0 commit comments