Skip to content

Commit aa4a7ba

Browse files
authored
feat(encryption): Stage 6E-2d — §7.1 quiescence barrier scaffold (#933)
## Summary Stage 6E-2d ships the §7.1 quiescence-barrier scaffold inside `EnableRaftEnvelope`. The barrier mechanism is fully wired and tested but unreachable from operators — `raftEnvelopeWrapEnabled` stays `false` in this slice (6E-2e wires the closure source from `main.go`; 6E-2f atomically flips the gate). ### What ships - `raftengine.ErrEnvelopeCutoverInProgress` sentinel error returned on `Propose` under the barrier. `ProposeAdmin` is exempt by interface contract so the cutover marker itself (and ConfChange-time `RegisterEncryptionWriter`) cross the barrier. - `dynamicWrappedProposer` barrier state (`barrierMu` + `inflightUser` counter + `drainSig` channel) gating user- `Propose` only. - `BeginCutoverBarrier` / `WaitInflightDrained` / `EndCutoverBarrier` trio on `dynamicWrappedProposer` driving steps 1 / 2 / 6 of the §7.1 sequence. - `ShardGroup` forwarder methods with degraded fast-path on raw- Engine fallback so test fixtures degrade gracefully (immediate- success drain, pre-closed Begin channel, no-op End). - `CutoverBarrierController` interface + `WithEncryptionAdminCutoverBarrier` option on `EncryptionAdminServer` (no production impl yet — 6E-2e). - `EnableRaftEnvelope` handler runs the §7.1 6-step sequence (Begin → WaitDrained → ProposeAdmin → awaitCutoverApply → InstallWrap → End-via-defer) when the gate is true. Refuses with `FailedPrecondition` when the controller or latest- applied-index callbacks are unwired so a misconfigured 6E-2f release fails closed before any side effect. - `raftEnvelopeWrapEnabled` converts from `const` to `var` so unit tests can flip it via `t.Cleanup`-based override (production value unchanged at `false`; 6E-2f atomically flips it). ### Behavior change, risk, test evidence - **Behavior change**: production-observable change is **none**. `raftEnvelopeWrapEnabled = false` short-circuits before any 6E-2d code path runs. The new error / barrier / state-machine are dead code under operator use until 6E-2e/2f land. - **Risk**: low. The barrier mechanism is exercised end-to-end in unit tests but never engaged from production wiring. - **Test evidence**: - `go test -race ./adapter/... ./kv/... ./internal/raftengine/...` targeting the encryption test surface — all pass. - `golangci-lint --config=.golangci.yaml run` (pre-commit hook fired during `git commit`) — 0 issues. - New tests pin: barrier blocks `Propose` / allows `ProposeAdmin`; `End` reopens `Propose`; `WaitDrained` respects ctx; drain semantics under in-flight Propose; the `ShardGroup` forwarder degraded fallback + production delegation; the state machine refuses without barrier/applied- index, drives the expected sequence, and runs `End` even on propose error or drain timeout. ### Caller audit - `ErrEnvelopeCutoverInProgress` is a new sentinel — no existing `errors.Is` callers. The documented future caller (`main_encryption_registration.go::RegisterEncryptionWriter`) routes through `ProposeAdmin` and is barrier-exempt as designed (comment at the call site already anticipated this). - `raftEnvelopeWrapEnabled` const → var: production read sites unchanged (one read in `EnableRaftEnvelope`); no callers performed compile-time reasoning on the value. Tests serialize their writes via `t.Cleanup`-based restore so the parallel `TestEncryptionAdmin_EnableRaftEnvelope_GatedUntil6E2` (which reads the var) observes the original `false` value while paused at `t.Parallel()`. ### 5-lens self-review 1. **Data loss** — none. `ErrEnvelopeCutoverInProgress` returns BEFORE `engine.Propose` so the entry was never accepted. 2. **Concurrency / distributed failures** — `barrierMu` serializes (read-barrier, inc-counter) atomically with (open-barrier, sample-counter); see the `dynamicWrappedProposer` type comment for the race that motivates a mutex over three atomics. Leader change mid- cutover: the handler's `requireLeader` precheck guards the entry; `ProposeAdmin` from a former leader returns `NotLeader` / `LeadershipLost`, the deferred `End` fires, the barrier closes, and the operator's retry restarts the sequence on the new leader. 3. **Performance** — one `sync.Mutex` acquire+release per `Propose` call (~50 ns), negligible vs Raft consensus latency. No new metrics, no new Raft round-trips, no new Pebble reads. 4. **Data consistency** — MVCC/OCC/HLC visibility unchanged. HLC lease renewal (which calls `Propose`, not `ProposeAdmin`) does observe the barrier; a single missed renewal tick during the few-ms barrier window is within the existing renewal tolerance (`hlcRenewalInterval = 1s`, `hlcPhysicalWindowMs = 3s`). The `awaitCutoverApply` step-4 guarantees the cutover entry has been applied locally before step-5 `InstallWrap` runs, closing the propose-side race where the wrap closure could be installed before the engine sees the cutover index. 5. **Test coverage** — six new tests in `kv/` covering the barrier semantics; five new tests in `adapter/` covering the state machine. The barrier-incapable fallback path is covered alongside the production delegation path so a regression that swapped one for the other surfaces immediately. ## Test plan - [x] `go test -race` on `adapter`, `kv`, `internal/raftengine` (targeted encryption surface) - [x] `golangci-lint run` clean - [ ] Full `make test` in CI - [ ] No Jepsen suite — barrier is unreachable from production paths until 6E-2f flips the gate. Jepsen coverage will attach to 6E-2f / Stage 9.
2 parents 9a79469 + f8081e0 commit aa4a7ba

7 files changed

Lines changed: 1598 additions & 8 deletions

File tree

adapter/encryption_admin.go

Lines changed: 347 additions & 3 deletions
Large diffs are not rendered by default.

adapter/encryption_admin_test.go

Lines changed: 580 additions & 4 deletions
Large diffs are not rendered by default.

docs/design/2026_05_31_partial_6e_enable_raft_envelope.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,12 @@
1414
|---|---|---|
1515
| 6E-1a — FSM apply machinery (`applyEnableRaftEnvelope`, sidecar field plumbing, wire sub-tag whitelist) | shipped | #899 (3bffd344) |
1616
| 6E-1b — `EnableRaftEnvelope` admin RPC + `elastickv-admin enable-raft-envelope` CLI subcommand (server method **gated** until 6E-2; see §3.3 below) | shipped | #907 |
17-
| 6E-2 — engine unwrap-on-apply + coordinator wrap-on-propose + §7.1 proposal-quiescence barrier (atomic 3-piece flip; also flips the 6E-1b gate to true) | not started ||
17+
| 6E-2a — typed cutover sentinel + sidecar `RaftEnvelopeCutoverIndex` apply seam | shipped | (rolled into earlier slices) |
18+
| 6E-2b — `ProposeAdmin` sibling on `raftengine.Proposer` (barrier-exempt by interface contract) | shipped | (rolled into earlier slices) |
19+
| 6E-2c — Coordinator `dynamicWrappedProposer` + `ShardGroup.raftPayloadWrap` hot-swap + `Proposer()` accessor + Internal.Forward wrap-aware proposer + fail-closed startup guard on active cutover | shipped | #922 (eb371ca6) |
20+
| 6E-2d — §7.1 6-step quiescence barrier on `dynamicWrappedProposer.Propose` + `ShardGroup` barrier forwarders + `CutoverBarrierController` option + state-machine in `EnableRaftEnvelope` handler (gated behind `raftEnvelopeWrapEnabled = false`; flipped in 6E-2f) | shipped | this PR |
21+
| 6E-2e — `main.go` wiring: `OpenConfig.RaftCipher` + `RaftCutoverIndex``CutoverBarrierController` implementation fanning out over participating `ShardGroup`s. **BLOCKER (a):** route admin RPCs (RotateDEK, RegisterEncryptionWriter) through the wrap-aware proposer so post-cutover admin entries are wrapped — the raw-engine ProposeAdmin path leaves cleartext admin entries at `index > cutoverIdx` and §6.3 halts the cluster (codex P1 #1 round-2 on PR933). **BLOCKER (b):** auto-install the wrap on every replica's FSM-apply of the cutover marker so a leader failover between cutover commit and `InstallWrap` doesn't admit cleartext writes on the newly-elected leader (codex P1 round-3 on PR933). Both blockers MUST land before 6E-2f flips the gate. | not started ||
22+
| 6E-2f — atomic flip of `raftEnvelopeWrapEnabled` to `true` (the §3.3 6E-1b gate release) | not started ||
1823
| 6E-3 — §6C-4 fail-closed guards (`ErrEnvelopeCutoverDivergence`, `ErrEncryptionNotBootstrapped`, `ErrLocalEpochOutOfRange`) | not started ||
1924

2025
With 6E-1 (both sub-milestones) complete, the wire-format and

internal/raftengine/engine.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,20 @@ var (
2323
// ErrLeadershipTransferInProgress indicates a leadership transfer
2424
// is under way and proposals are being held back.
2525
ErrLeadershipTransferInProgress = errors.New("raft engine: leadership transfer in progress")
26+
// ErrEnvelopeCutoverInProgress indicates the §7.1 raft-envelope
27+
// cutover barrier is open on this leader and rejecting fresh
28+
// USER proposals on the Propose path. The error is the step-1
29+
// gate-rejection surface for coordinator clients while the
30+
// EnableRaftEnvelope handler runs the 6-step barrier sequence
31+
// (block intake → drain → propose cutover via ProposeAdmin →
32+
// wait apply → flip wrap → unblock). Callers should treat it
33+
// as a transient back-off: a healthy cluster spends single-
34+
// digit milliseconds inside the barrier. ProposeAdmin is the
35+
// design's barrier exemption (admin entries that MUST remain
36+
// admissible across the barrier — the cutover marker itself,
37+
// ConfChange-time RegisterEncryptionWriter) and never observes
38+
// this error.
39+
ErrEnvelopeCutoverInProgress = errors.New("raft engine: envelope cutover in progress")
2640
)
2741

2842
type State string

kv/raft_payload_wrapper.go

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package kv
22

33
import (
44
"context"
5+
"sync"
56
"sync/atomic"
67

78
"github.com/bootjp/elastickv/internal/raftengine"
@@ -131,9 +132,44 @@ func (p *wrappedProposer) ProposeAdmin(ctx context.Context, data []byte) (*rafte
131132
// stores nil to express "wrap inactive". This is so the call site
132133
// (typically a ShardGroup) owns the storage and the proposer just
133134
// reads.
135+
//
136+
// §7.1 quiescence-barrier state (Stage 6E-2d) lives here too — see
137+
// the BeginCutoverBarrier / WaitInflightDrained / EndCutoverBarrier
138+
// trio below for the 6-step state machine's mechanics. The barrier
139+
// gates only the user-Propose path; ProposeAdmin is exempt by
140+
// interface contract so the EnableRaftEnvelope handler can propose
141+
// the cutover marker itself across its own barrier.
134142
type dynamicWrappedProposer struct {
135143
inner raftengine.Proposer
136144
wrapPtr *atomic.Pointer[RaftPayloadWrapper]
145+
146+
// barrierMu guards barrierOpen, inflightUser, and drainSig. Held
147+
// only briefly on Propose entry/exit (the engine.Propose call
148+
// itself runs without the mutex), so contention is bounded by
149+
// the per-Propose acquire/release pair. The 6E-2d handler holds
150+
// it longer on the BeginCutoverBarrier / EndCutoverBarrier
151+
// transitions, but those are once per cutover (~ms-scale).
152+
//
153+
// Why a mutex rather than three atomics: the barrier-open check
154+
// and the in-flight counter increment MUST be observed as one
155+
// atomic transition, or a Propose that read barrierOpen=false
156+
// just before the cutover handler stores true could increment
157+
// the counter AFTER BeginCutoverBarrier observed it as 0,
158+
// leaving WaitInflightDrained returning success while a fresh
159+
// user proposal slips through to engine.Propose. The mutex is
160+
// the simplest way to make Propose's (read-barrier, inc-counter)
161+
// pair atomic with the handler's (open-barrier, sample-counter)
162+
// pair on the other side.
163+
barrierMu sync.Mutex
164+
barrierOpen bool
165+
inflightUser int64
166+
// drainSig is freshly allocated each BeginCutoverBarrier call
167+
// and closed when inflightUser drops to 0 after the barrier has
168+
// opened (either at BeginCutoverBarrier time if no Propose is
169+
// in flight, or at the last proposeExit that hits 0). nil
170+
// outside a barrier cycle so WaitInflightDrained called without
171+
// an active barrier degrades gracefully to immediate success.
172+
drainSig chan struct{}
137173
}
138174

139175
// newDynamicWrappedProposer wires a proposer that consults wrapPtr
@@ -174,7 +210,24 @@ func (p *dynamicWrappedProposer) currentWrap() RaftPayloadWrapper {
174210
return nil
175211
}
176212

213+
// Propose runs the §7.1 quiescence-barrier gate, then forwards the
214+
// (optionally wrapped) payload to the inner Propose. The gate
215+
// rejects with ErrEnvelopeCutoverInProgress while the barrier is
216+
// open — the 6E-2d EnableRaftEnvelope handler holds the barrier
217+
// open for the few-ms it takes to commit the cutover entry and
218+
// publish the active wrap closure. Callers should treat the error
219+
// as a transient back-off (retry on a new leader-issued ts).
220+
//
221+
// The barrierMu + inflight-counter pair makes (read-barrier,
222+
// inc-counter) atomic with the handler's (open-barrier,
223+
// sample-counter); see the type comment for the race that motivates
224+
// it.
177225
func (p *dynamicWrappedProposer) Propose(ctx context.Context, data []byte) (*raftengine.ProposalResult, error) {
226+
if err := p.beginUserPropose(); err != nil {
227+
return nil, err
228+
}
229+
defer p.endUserPropose()
230+
178231
wrapped, err := applyRaftPayloadWrap(p.currentWrap(), data)
179232
if err != nil {
180233
return nil, err
@@ -186,6 +239,190 @@ func (p *dynamicWrappedProposer) Propose(ctx context.Context, data []byte) (*raf
186239
return res, nil
187240
}
188241

242+
// beginUserPropose increments the in-flight counter under
243+
// barrierMu, returning ErrEnvelopeCutoverInProgress if the barrier
244+
// is open. Returning the error without incrementing is intentional:
245+
// a rejected Propose must not count toward in-flight (the handler
246+
// would otherwise wait on a fictional in-flight that will never
247+
// drain). Pair with endUserPropose.
248+
func (p *dynamicWrappedProposer) beginUserPropose() error {
249+
p.barrierMu.Lock()
250+
defer p.barrierMu.Unlock()
251+
if p.barrierOpen {
252+
return raftengine.ErrEnvelopeCutoverInProgress
253+
}
254+
p.inflightUser++
255+
return nil
256+
}
257+
258+
// endUserPropose pairs with beginUserPropose: decrement the
259+
// in-flight counter under barrierMu and, if the barrier is open
260+
// and we just dropped to 0, close drainSig so the handler's
261+
// WaitInflightDrained unblocks. Idempotent close via the
262+
// select/default pattern so concurrent EndCutoverBarrier doesn't
263+
// race a late proposeExit on a freshly nil-ed drainSig (the nil
264+
// check covers EndCutoverBarrier having already torn down the
265+
// channel; the closed-recv guard covers the BeginCutoverBarrier
266+
// path that closed an empty-inflight channel at open time).
267+
func (p *dynamicWrappedProposer) endUserPropose() {
268+
p.barrierMu.Lock()
269+
defer p.barrierMu.Unlock()
270+
p.inflightUser--
271+
if p.barrierOpen && p.inflightUser == 0 && p.drainSig != nil {
272+
select {
273+
case <-p.drainSig:
274+
// already closed at BeginCutoverBarrier-with-no-inflight
275+
// or by a sibling exit; treat as success.
276+
default:
277+
close(p.drainSig)
278+
}
279+
}
280+
}
281+
282+
// BeginCutoverBarrier opens the §7.1 step-1 quiescence barrier.
283+
// After return, every fresh dynamicWrappedProposer.Propose call
284+
// fails with ErrEnvelopeCutoverInProgress until EndCutoverBarrier
285+
// runs. ProposeAdmin is unaffected (the cutover marker proposes
286+
// through it).
287+
//
288+
// HAZARDS — per-leader scope of the barrier (codex P1 round-2 and
289+
// round-3 on PR933): the barrier is an in-memory data structure
290+
// owned by THIS leader's dynamicWrappedProposer. It does not
291+
// coordinate across the cluster, and it does not survive leadership
292+
// transfer. Two related future-state failure modes follow from
293+
// that scope and MUST be closed by 6E-2e before 6E-2f flips the
294+
// gate; 6E-2d ships them inert by leaving raftEnvelopeWrapEnabled
295+
// false so production never opens the cutover window.
296+
//
297+
// (a) Wrap-gap admin RPCs (codex P1 #1 round-2):
298+
// Other admin RPCs that route through ProposeAdmin (RotateDEK,
299+
// RegisterEncryptionWriter) are barrier-exempt and currently
300+
// reach the engine via the raw-engine s.proposer in
301+
// adapter/encryption_admin.go. Between the cutover marker's
302+
// commit and the handler's InstallWrap call, an admin RPC
303+
// that lands at `index > raftEnvelopeCutoverIndex` would be
304+
// cleartext, and the §6.3 strict-`>` apply hook on every
305+
// follower would treat it as a wrapped envelope and halt
306+
// apply cluster-wide.
307+
//
308+
// Remediation options for 6E-2e:
309+
// Option A (preferred): route RotateDEK /
310+
// RegisterEncryptionWriter through the
311+
// wrap-aware proposer so post-cutover
312+
// admin entries are wrapped. The
313+
// cutover marker itself remains on a
314+
// separate raw-engine reference held
315+
// by the EnableRaftEnvelope handler.
316+
// Option B: extend cutoverSem to serialize RotateDEK and
317+
// RegisterEncryptionWriter against the
318+
// EnableRaftEnvelope handler so no admin RPC can
319+
// race the barrier window.
320+
// See main_encryption_registration.go's call-site comment for
321+
// the 7c §3.1 wiring that Option A would extend.
322+
//
323+
// (b) Leader failover mid-cutover (codex P1 round-3):
324+
// If leadership transfers from L1 to L2 between the cutover
325+
// marker's commit and L1's InstallWrap call, L2 has its own
326+
// barrierOpen=false and a nil wrap pointer. L2 admits a fresh
327+
// user proposal through Propose without wrapping; it lands at
328+
// `index > raftEnvelopeCutoverIndex` in cleartext, and once
329+
// L2 (or any follower) applies the cutover marker the §6.3
330+
// strict-`>` hook treats every subsequent cleartext proposal
331+
// as a wrapped envelope and halts.
332+
//
333+
// Remediation options for 6E-2e:
334+
// Option A (preferred): auto-install the wrap on every
335+
// replica's FSM-apply of the cutover
336+
// marker so L2's
337+
// dynamicWrappedProposer publishes the
338+
// same wrap closure independently of
339+
// leadership state. The handler's
340+
// InstallWrap call then becomes a
341+
// redundant convenience (matches the
342+
// state every follower will reach via
343+
// the apply path).
344+
// Option B: make dynamicWrappedProposer.Propose consult the
345+
// sidecar's RaftEnvelopeCutoverIndex on every call
346+
// and refuse when the wrap pointer is nil but the
347+
// sidecar already reflects a cutover. Trades a
348+
// sidecar load per propose for a closed gap.
349+
//
350+
// Both hazards (a) and (b) share a single shape: post-cutover
351+
// cleartext entries land in Raft at indexes that the §6.3 apply
352+
// hook treats as wrapped. The gate (`raftEnvelopeWrapEnabled =
353+
// false`) is the only thing keeping either from triggering today.
354+
//
355+
// Idempotent against double-Begin: a second call freshens drainSig
356+
// and leaves barrierOpen true. CALLER SAFETY: a goroutine that was
357+
// blocked on a prior cycle's drainSig from WaitInflightDrained is
358+
// orphaned by the freshen (it never observes the close because the
359+
// channel reference was discarded). This is safe in practice
360+
// because the EncryptionAdminServer serializes EnableRaftEnvelope
361+
// calls via cutoverSem, so only one handler goroutine ever drives
362+
// the BeginCutoverBarrier → WaitInflightDrained → EndCutoverBarrier
363+
// sequence at a time. A future caller that drives the barrier
364+
// outside that semaphore MUST not rely on Begin's idempotency to
365+
// rescue an orphaned WaitInflightDrained — explicit End/Begin
366+
// ordering on a single goroutine is the only correct usage.
367+
//
368+
// Returns the channel that closes when in-flight drains to 0 so
369+
// callers MAY block on it directly; the recommended pattern is to
370+
// use WaitInflightDrained which composes context cancellation.
371+
func (p *dynamicWrappedProposer) BeginCutoverBarrier() <-chan struct{} {
372+
p.barrierMu.Lock()
373+
defer p.barrierMu.Unlock()
374+
p.drainSig = make(chan struct{})
375+
p.barrierOpen = true
376+
if p.inflightUser == 0 {
377+
// Fast path: no in-flight, drain is already complete.
378+
close(p.drainSig)
379+
}
380+
return p.drainSig
381+
}
382+
383+
// WaitInflightDrained blocks until the in-flight Propose counter
384+
// drops to 0 after BeginCutoverBarrier was called, or ctx fires.
385+
// Returns nil on drain, an error wrapping ctx.Err() with a
386+
// domain-only prefix on cancellation (the ShardGroup forwarder
387+
// adds the package prefix so operator logs don't carry a
388+
// redundant "kv: ... kv: ..." chain, claude r2 finding B), and
389+
// nil also if no barrier is currently open (degraded fast-path so
390+
// out-of-sequence calls don't deadlock callers).
391+
func (p *dynamicWrappedProposer) WaitInflightDrained(ctx context.Context) error {
392+
p.barrierMu.Lock()
393+
ch := p.drainSig
394+
p.barrierMu.Unlock()
395+
if ch == nil {
396+
// No active barrier — drain is trivially complete.
397+
return nil
398+
}
399+
select {
400+
case <-ch:
401+
return nil
402+
case <-ctx.Done():
403+
return errors.Wrap(ctx.Err(), "drain await canceled")
404+
}
405+
}
406+
407+
// EndCutoverBarrier closes the §7.1 step-6 barrier. After return,
408+
// fresh Propose calls succeed again. Idempotent against a
409+
// no-barrier-active state. Callers MUST call this once for each
410+
// BeginCutoverBarrier (the EnableRaftEnvelope handler uses defer).
411+
func (p *dynamicWrappedProposer) EndCutoverBarrier() {
412+
p.barrierMu.Lock()
413+
defer p.barrierMu.Unlock()
414+
p.barrierOpen = false
415+
// Drop the drainSig reference so a stale WaitInflightDrained
416+
// caller that reads the channel after EndCutoverBarrier sees
417+
// nil (immediate-success degraded path) rather than blocking
418+
// on a closed channel from a previous cycle. Whether the
419+
// channel was closed or not at this point depends on whether
420+
// in-flight drained; both shapes are acceptable transient
421+
// states because no new BeginCutoverBarrier has run yet to
422+
// allocate a fresh channel.
423+
p.drainSig = nil
424+
}
425+
189426
// ProposeAdmin mirrors Propose's wrap-applies semantics. See
190427
// wrappedProposer.ProposeAdmin for the design rationale (the wrap
191428
// layer is NOT a barrier exemption; the EnableRaftEnvelope cutover

0 commit comments

Comments
 (0)