Skip to content

Commit 3951337

Browse files
committed
docs(snapshot-skip B2 round-6): claude[bot] round-5 conditional LGTM items
claude[bot] round-5 verdict was 'Ready to merge pending design doc update'. Two non-code items: 1) engine.go:2001 TODO sentinel — the round-5 PR comment claimed a 'sentinel comment in engine.applySnapshot' would land, but no such comment was added. Round-6 adds it directly above the openAndRestoreFSMSnapshot call so a future B3 / follow-up implementor can find the planned SetDurableAppliedIndex hook site without re-discovering the rationale. Cites the design doc Non-Goals scope-out and PR #915 round-4/5 codex P2. 2) docs/design/2026_06_02_idempotent_snapshot_restore.md — three stale references to AppliedIndexReporter / factory-method pattern, superseded by PR #915 round-5 (kvFSM directly satisfies raftengine.AppliedIndexReader via LastAppliedIndex). The stale pseudocode would mislead the B3 implementor into adding a reporter shim that was already superseded. Updates: - §3 'kvFSM exposes its store' block rewritten to show direct interface satisfaction + the compile-time guard. - §4 fsmAlreadyAtIndex pseudocode rewritten to use direct type-assert against raftengine.AppliedIndexReader. - Implementation Plan B2 row updated to reflect the actual kvFSM.LastAppliedIndex() shape + the round-5 monotonic applyMu.Lock() RMW guard on SetDurableAppliedIndex. - §8 compatibility list: AppliedIndexReporter removed; replaced by AppliedIndexWriter (the actually-shipped interface) plus a retrospective sentence explaining the round-5 supersession. Three retrospective mentions of AppliedIndexReporter remain (as 'why this is NOT what we use') so future readers understand the design history without re-deriving it. No code semantics change in round-6. Doc + comment-only. Tests: go vet + golangci-lint clean across kv/, store/, internal/raftengine/etcd/. Refs PR #915 round-5 claude[bot] verdict ('Ready to merge pending design doc update').
1 parent a8e2cf5 commit 3951337

2 files changed

Lines changed: 52 additions & 19 deletions

File tree

docs/design/2026_06_02_idempotent_snapshot_restore.md

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -236,21 +236,37 @@ type AppliedIndexReader interface {
236236
}
237237
```
238238

239-
`kvFSM` exposes its store through a typed accessor so wal_store.go can
240-
reach it without importing the concrete pebble type:
239+
`kvFSM` directly satisfies `raftengine.AppliedIndexReader` by
240+
forwarding to its underlying store (PR #915 round-5 — round-1 had a
241+
factory `AppliedIndexReader() AppliedIndexReader` method intended to
242+
be called through a separate `AppliedIndexReporter` interface, but
243+
that pattern would require the skip gate to know about the reporter
244+
shim; the direct interface satisfaction is simpler and a compile-time
245+
guard catches future signature drift):
241246

242247
```go
243-
type AppliedIndexReporter interface {
244-
AppliedIndexReader() AppliedIndexReader
245-
}
246-
func (f *kvFSM) AppliedIndexReader() AppliedIndexReader {
247-
if r, ok := f.store.(AppliedIndexReader); ok {
248-
return r
248+
// kv/fsm.go
249+
func (f *kvFSM) LastAppliedIndex() (uint64, bool, error) {
250+
r, ok := f.store.(raftengine.AppliedIndexReader)
251+
if !ok {
252+
return 0, false, nil
249253
}
250-
return nil
254+
idx, present, err := r.LastAppliedIndex()
255+
if err != nil {
256+
return 0, false, errors.WithStack(err)
257+
}
258+
return idx, present, nil
251259
}
260+
261+
// kv/fsm_applied_index_iface_check.go (compile-time guard)
262+
var _ raftengine.AppliedIndexReader = (*kvFSM)(nil)
263+
var _ raftengine.AppliedIndexWriter = (*kvFSM)(nil)
252264
```
253265

266+
The compile-time guard means any future rename or signature drift
267+
fails `go build` immediately — the soak investment is protected at
268+
the compiler level.
269+
254270
### 4. Conditional restore (with conservative error fallback)
255271

256272
```go
@@ -279,16 +295,20 @@ func restoreSnapshotState(fsm StateMachine, snapshot raftpb.Snapshot, fsmSnapDir
279295

280296
// fsmAlreadyAtIndex returns true ONLY when we can prove the FSM is
281297
// already at or past `want`. Any uncertainty -- FSM doesn't expose
282-
// the reporter, store doesn't expose the reader, read error, or
283-
// missing meta key -- returns false so we fall back to the full
284-
// restore. A stale-but-incorrect skip is far worse than a wasteful
285-
// full restore; the fallback errs strictly toward restoring.
298+
// the reader interface, read error, or missing meta key -- returns
299+
// false so we fall back to the full restore. A stale-but-incorrect
300+
// skip is far worse than a wasteful full restore; the fallback errs
301+
// strictly toward restoring.
302+
//
303+
// Direct type-assert against raftengine.AppliedIndexReader (PR #915
304+
// round-5): kvFSM satisfies the interface directly via its
305+
// LastAppliedIndex method, so no separate AppliedIndexReporter shim
306+
// is needed. The compile-time guard in kv/fsm_applied_index_iface_check.go
307+
// keeps this stable.
286308
func fsmAlreadyAtIndex(fsm StateMachine, want uint64) bool {
287-
reporter, ok := fsm.(AppliedIndexReporter)
309+
r, ok := fsm.(raftengine.AppliedIndexReader)
288310
if !ok { return false }
289-
reader := reporter.AppliedIndexReader()
290-
if reader == nil { return false }
291-
have, present, err := reader.LastAppliedIndex()
311+
have, present, err := r.LastAppliedIndex()
292312
if err != nil || !present { return false }
293313
return have >= want
294314
}
@@ -814,8 +834,12 @@ present.
814834
- `ApplyIndexAware` is **already** in `main`; this design only adds
815835
consumers.
816836
- The new opt-in interfaces (`AppliedIndexReader`,
817-
`AppliedIndexReporter`, `SnapshotHeaderApplier`) are additive.
837+
`AppliedIndexWriter`, `SnapshotHeaderApplier`) are additive.
818838
FSMs that don't implement them fall back to the current behaviour.
839+
(Round-1 / round-2 of this doc mentioned an `AppliedIndexReporter`
840+
factory-method shim; PR #915 round-5 superseded it by having
841+
`kvFSM` satisfy `AppliedIndexReader` directly via its
842+
`LastAppliedIndex` method.)
819843
- `metaAppliedIndexBytes` is new. Older fsm.db files don't have it.
820844
The `present=false` branch makes the first restart after upgrade
821845
fall back to full restore, populating the meta key from the next
@@ -863,7 +887,7 @@ restoreSnapshotState skipped (FSM at index %d, snapshot at %d, ceiling=%d, cutov
863887
| Branch | Content | Behaviour change |
864888
|---|---|---|
865889
| **B1** (this PR) | Design doc | None |
866-
| **B2** | `ApplyMutationsRaftAt` / `DeletePrefixAtRaftAt` overloads + meta-key bundling in both leaves + `pebbleStore.LastAppliedIndex()` (under `dbMu.RLock()`) + `pebbleStore.SetDurableAppliedIndex()` (under `dbMu.RLock()`, **`pebble.Sync` unconditionally**) + `kvFSM.AppliedIndexReader()` accessor + `kvFSM.SetDurableAppliedIndex` forwarding + thread `f.pendingApplyIdx` into the data-Apply leaves + BOTH `persistCreatedSnapshot` (`engine.go:2679`) AND `e.persistLocalSnapshotPayload` (`engine.go:4032`, the SnapshotCount-triggered hot path) call `SetDurableAppliedIndex` BEFORE the corresponding `persist.SaveSnap` | Meta key starts being written on every data Apply AND at every snapshot persist (both config-snapshot and steady-state local-snapshot paths). Skip is still disabled. Soak in production for one release. |
890+
| **B2** | `ApplyMutationsRaftAt` / `DeletePrefixAtRaftAt` overloads + meta-key bundling in both leaves + `pebbleStore.LastAppliedIndex()` (under `dbMu.RLock()`) + `pebbleStore.SetDurableAppliedIndex()` (under `dbMu.RLock()` + `applyMu.Lock()` RMW monotonic guard, **`pebble.Sync` unconditionally**) + `kvFSM.LastAppliedIndex()` directly satisfies `raftengine.AppliedIndexReader` (compile-time guard in `kv/fsm_applied_index_iface_check.go`) + `kvFSM.SetDurableAppliedIndex` forwarding + thread `f.pendingApplyIdx` into the data-Apply leaves + BOTH `persistCreatedSnapshot` (`engine.go:2679`) AND `e.persistLocalSnapshotPayload` (`engine.go:4032`, the SnapshotCount-triggered hot path) call `SetDurableAppliedIndex` BEFORE the corresponding `persist.SaveSnap` | Meta key starts being written on every data Apply AND at every snapshot persist (both config-snapshot and steady-state local-snapshot paths). Skip is still disabled. Soak in production for one release. |
867891
| **B3** | `restoreSnapshotState` skip gate + `applyHeaderStateOnSkip(snapPath, tok.CRC32C)` orchestrating size + footer-vs-tokenCRC + full-body-CRC verification using `internal/raftengine/etcd`'s existing helpers (matching `openAndRestoreFSMSnapshot`'s safety contract) + two-phase `SnapshotHeaderApplier` seam on `kvFSM` (`ParseSnapshotHeader(r io.Reader) (ceiling, cutover, err)` + pure `ApplySnapshotHeader(ceiling, cutover)`) + metrics + INFO log | **User-visible cold-start win.** |
868892
| **B4** | Lower `HEALTH_TIMEOUT_SECONDS` default once production data shows steady-state skip rate ≥ 90 % | Tighter ceiling; the env override remains honoured. |
869893

internal/raftengine/etcd/engine.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1998,6 +1998,15 @@ func (e *Engine) applyReadySnapshot(snapshot raftpb.Snapshot) error {
19981998
if err != nil {
19991999
return errors.Wrapf(err, "decode snapshot token index=%d", snapshot.Metadata.Index)
20002000
}
2001+
// B3/follow-up: also call SetDurableAppliedIndex(tok.Index) here
2002+
// after Restore so peer-after-InstallSnapshot populates the meta
2003+
// key. The local-snapshot persist path already bumps the live
2004+
// store (engine.persistLocalSnapshotPayload), but the receiving
2005+
// node's restored store inherits the pre-bump value embedded in
2006+
// the snapshot artifact. Design Non-Goals §
2007+
// docs/design/2026_06_02_idempotent_snapshot_restore.md:71-74
2008+
// scopes this out of Branch 2; see PR #915 round-4/5 codex P2 on
2009+
// engine.go:4077 for the rationale.
20012010
if err := openAndRestoreFSMSnapshot(e.fsm, fsmSnapPath(e.fsmSnapDir, tok.Index), tok.CRC32C); err != nil {
20022011
return errors.Wrapf(err, "restore fsm snapshot file index=%d crc=%08x", tok.Index, tok.CRC32C)
20032012
}

0 commit comments

Comments
 (0)