Commit 32d8b93
authored
feat(keyviz): fan-out §9.1 (group, term) dedupe (Phase 2-C+ PR-3c) (#822)
## Summary
Replace the Phase 2-C **§4.2 row-level max-merge** for writes with the
parent design's canonical **§9.1 per-cell `(group, term)` dedupe +
sum-across-terms**. This is the headline Phase 2-C+ behavior change:
under a mid-window leader flip, the cluster heatmap now recovers the
TRUE write count instead of being conservatively bounded by the larger
leader's slice.
## Why
Per `docs/admin_ui_key_visualizer_design.md` §9.1:
> "For write samples the authoritative identity is `(raftGroupID,
leaderTerm)` — by Raft invariants at most one leader exists per term per
group — so the admin binary collapses write samples to one value per
`(bucketID, raftGroupID, leaderTerm, windowStart)` key. ... Across
distinct `leaderTerm` values for the same group and window, values are
summed because each term's leader only observed its own term's writes."
PR-3a (#709) gave the sampler the `GroupID + LeaderTerm` fields. PR-3b
(#720) wired them through the proto + JSON wire as parallel arrays and
started the `main.go` publisher. This PR consumes them.
## Algorithm (per cell, writes series)
1. For each source's contribution at this cell, classify by `(group,
term)`.
2. Within the same `(group, term)`: keep the larger value (Raft
invariant — agreement is the steady state); raise `conflict=true` on
disagreement (multiple "leaders" reporting writes for the same term is a
real divergence).
3. **Sum across distinct `(group, term)` entries** — each term's leader
only observed its own term's writes, so summing is exact.
4. **Fail-closed fallback**: if ANY contribution has `group=0` or
`term=0` (legacy peer or a publisher that hasn't fired yet), fall back
to the §4.2 max-merge for the WHOLE cell. Summing across an unknown-term
entry would risk double-counting overlapping windows.
Reads / read_bytes unchanged — local serves are independent and never
need per-leader dedupe (sum-across-nodes is exact). Write_bytes follows
writes.
## Structural change
`mergeRowInto` no longer writes through into `dst.Values` incrementally.
Each `(bucket, cell)` accumulates state in a `cellMergeAcc`; after all
source rows have been processed, `resolveRowMergeAcc` materialises one
final `KeyVizRow` per bucket. The accumulator carries:
- `sum` / `lastGroup` / `lastTerm`: read-path running sum + best-effort
identity.
- `byTerm[(group, term)] → max value`: canonical write-path table.
- `termConflict[(group, term)] → bool`: per-key disagreement flag.
- `hasUnknownTerm`: flips fallback path on first unknown contribution.
- `fallbackMax`, `fallbackSeenNonZeroValue`, `fallbackNonZeroDistinct`:
fallback max-merge state, maintained for every non-zero contribution so
a cell that later flips into the fallback path has the right running max
+ disagreement signal.
The dispatch is gated by `mergeUsesGroupTermDedupe(series)` — `true` for
writes/write_bytes, `false` for reads/read_bytes.
## Caller audit (semantic change to writes value)
- `mergeKeyVizMatrices` → only `KeyVizFanout.Run` (same package).
- `KeyVizFanout.Run` → only `KeyVizHandler.ServeHTTP` (same package).
- HTTP `/admin/api/v1/keyviz/matrix` → SPA at
`web/admin/src/pages/KeyViz.tsx` treats `values[]` as opaque numbers and
reads `row.conflict` as a coarse hatch signal. No max/sum assumption, no
consumer-side change required.
- gRPC `AdminServer.GetKeyVizMatrix` returns single-node and does not
invoke `mergeKeyVizMatrices` — unaffected.
No structural caller change. The numeric meaning of `values[]` for
writes shifts from "max across nodes" to "§9.1 sum-across-terms", which
is the documented intentional behavior of PR-3c.
## Out of scope
- **Per-cell `Conflicts []bool` wire format**: parent design §9.1 says
"the cell-level flag will land with the leaderTerm-based merge".
Deferred to a future PR-3d so this PR's diff is contained to the merge
algorithm itself. The row-level `Conflict bool` is now driven by
any-cell-saw-conflict, matching the existing SPA hatch contract.
- **SPA per-cell hatching**: requires the per-cell wire change above.
## Test plan
- [x] `go test -race -count=1 ./internal/admin/... ./keyviz/...` — pass.
- [x] `golangci-lint run` (full project) — 0 issues.
- [x] New tests:
- `TestMergeKeyVizMatricesGroupTermSumAcrossTerms`: ex-leader (term=42,
value=30) + new-leader (term=43, value=50) on the SAME cell merge to 80
(sum), recovering the pre-transfer slice §4.2 max would have lost.
- `TestMergeKeyVizMatricesGroupTermConflictWithinSameTerm`: two sources
reporting different non-zero values for the SAME `(group, term)` raise
`conflict=true` (Raft invariant violation).
- `TestMergeKeyVizMatricesGroupTermFallbackOnUnknownTerm`: modern
(group=7, term=42, value=30) + legacy (no arrays, value=50) fall back to
max=50, not sum=80 — the fail-closed guard preserves PR-3b's
mixed-version safety.
- [x] Existing PR-3b tests
(`TestMergeKeyVizMatricesPerCellIdentityMatchesValueOwner`,
`TestMergeKeyVizMatricesLegacyPeerWinnerResetsIdentity`, the §4.2-shape
`TestMergeKeyVizMatricesWritesMax*` tests with empty arrays which now
hit the fallback path) all continue to pass.
- [ ] Jepsen — N/A (admin-side merge algorithm, no replication/MVCC/OCC
impact).
## Five-lens self-review
1. **Data loss** — admin-side merge only; no on-disk format change. The
fail-closed fallback preserves PR-3b's mixed-version safety. A cluster
running mixed PR-3b/PR-3c binaries gets max-merge (PR-3b semantics) for
any cell touched by an old binary, and §9.1 sum elsewhere.
2. **Concurrency** — fan-out merge runs single-goroutine after parallel
peer calls return; no new shared state.
3. **Performance** — per-cell map allocation only when ≥1 source
contributes a known-term write to that cell (lazy `c.byTerm == nil`
check). Cells without known-term contributions (legacy fallback) skip
map allocation entirely. The accumulator-based two-pass merge is
O(sources × rows × columns), same asymptotic as the previous incremental
merge.
4. **Data consistency** — §9.1 algorithm is the canonical reference.
Tests pin the three nontrivial cases: sum across distinct terms
(recovers the leader-flip slice), conflict within same term (Raft
invariant violation), unknown-term fallback (mixed-version safety).
5. **Test coverage** — algorithm pinned with three new table-driven
cases plus the existing PR-3b regression guards.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Improvements**
* More accurate merge behavior for write paths with finer-grained
grouping and robust fallback when contributor identity is unknown,
reducing incorrect aggregates and surfacing row-level conflicts
reliably.
* **Documentation**
* Clarified conflict semantics to explain when row-level conflict is set
and future per-cell conflict plans.
* **Tests**
* Added unit tests covering merge summing across terms, conflict
detection, and unknown-identity fallback.
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/822?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->3 files changed
Lines changed: 409 additions & 106 deletions
0 commit comments