You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
Build the end-to-end pipeline that takes the sampler-side `RaftGroupID +
LeaderTerm` fields shipped in PR #709 (Phase 2-C+ PR-3a) and surfaces
them on the wire (proto + JSON + gRPC) so the fan-out aggregator (PR-3c,
next) has data to dedupe by `(group, term)` instead of the conservative
§4.2 max-merge.
## Why
Per `docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md` §4.2, the
cluster fan-out today max-merges write samples per row — under stable
leadership this is exact, but under a leadership flip it understates the
true window total by at most one leader's pre-transfer slice. §9.1's
canonical fix is to dedupe by `(bucket_id, raft_group_id, leader_term,
column)` and sum across terms; that needs `raft_group_id + leader_term`
on every row of the wire format. PR #709 added the fields to
`MatrixRow`; this PR plumbs them through.
## Changes
**Wire-format additions (forward-compatible):**
- `proto/admin.proto`: `KeyVizRow.raft_group_id = 13` and `leader_term =
14`. Old SPAs ignore the new fields; old servers don't emit them — the
zero value is the documented "term not tracked" fallback that triggers
the legacy max-merge.
- `internal/admin/keyviz_handler.go`: JSON `KeyVizRow` gains
`RaftGroupID` and `LeaderTerm` (omitempty); `newKeyVizRowFrom` copies
`mr.RaftGroupID` / `mr.LeaderTerm`.
- `internal/admin/keyviz_fanout.go`: `mergeRowInto` seeds the
destination row with `RaftGroupID + LeaderTerm` on first-seen (PR-3c
will switch to a per-cell merge).
- `adapter/admin_grpc.go`: `newKeyVizRowFrom` copies the same two fields
onto the proto `KeyVizRow`.
**main.go ticker:**
- `startKeyVizLeaderTermPublisher` launches a goroutine that polls each
`runtime.engine.Status().Term` every `Step` (matching the flush cadence
so each column observes a fresh term). Skips entirely when the sampler
is nil or no runtimes are wired.
- `publishLeaderTerms` / `publishLeaderTermsFromSnapshots` split into
two helpers so unit tests can exercise publication without standing up a
full `raftengine.Engine` fake.
- The ticker publishes once immediately so the very first flush column
carries a non-zero term.
## Scope (PR-3b only)
- Wire format **emission**: ✓ this PR
- Ticker that **publishes** terms via `SetLeaderTerm`: ✓ this PR
- Aggregator **consumption** (per-cell `(group, term)` merge replacing
max-merge): **deferred to PR-3c**
- `Conflict` per-cell promotion: **deferred to PR-3c**
- SPA changes: **none required** — the wire-format extension is
forward-compatible
With this PR alone, the new fields are visible on every response but the
aggregator still uses §4.2 max-merge → no behavior change for SPA
consumers.
## Test plan
- [x] `go test -race -count=1 ./keyviz/... ./internal/admin/...` —
passes
- [x] New test coverage:
- `TestKeyVizHandlerStampsRaftIdentity` (JSON path)
- `TestMergeKeyVizMatricesPreservesRaftIdentity` (fan-out merge)
- `TestGetKeyVizMatrixStampsRaftIdentity` (gRPC path)
- `TestPublishLeaderTermsFromSnapshotsStampsRows` (end-to-end through
sampler)
- `TestPublishLeaderTermsFromSnapshotsNilSafe` (nil-receiver contract)
- `TestStartKeyVizLeaderTermPublisherSkipsWhenSamplerNil /
...WhenNoRuntimes` (errgroup lifecycle)
- [x] `go build ./...` — clean
- [x] `golangci-lint run` — 0 issues
- [ ] Jepsen — N/A (admin-side wire extension, no replication/MVCC/OCC
impact)
## Five-lens self-review
1. **Data loss** — wire-format extension only; legacy max-merge fallback
preserves today's behavior when `LeaderTerm=0`.
2. **Concurrency** — ticker is a single goroutine; `SetLeaderTerm` takes
`groupTermsMu` (Phase 2-C+ PR-3a contract); no new shared state.
3. **Performance** — `Observe` hot path is unchanged; ticker adds `N ×
Status()` calls per `Step` where `N` is the runtime count (typically ≤ a
few groups), and `Status()` is a non-blocking read.
4. **Data consistency** — `RaftGroupID + LeaderTerm` round-trip through
every wire path verified by table-driven tests.
5. **Test coverage** — each layer (JSON, fan-out, gRPC, publisher) has
its own focused regression guard; end-to-end test pins the
sampler→column propagation.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Keyviz matrix now includes Raft leadership metadata with group ID and
leader term fields in responses.
* Leadership term information is automatically tracked and included in
key visualization output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
0 commit comments