Commit ca0b4e9
authored
feat(snapshot-skip B3): cold-start restoreSnapshotState skip gate + CRC verifier + metrics (#934)
## Summary
Implements **Branch 3** of the cold-start snapshot-restore skip
optimisation designed in PR #910, building on PR #915 (B2 —
`metaAppliedIndex` plumbing). This is the **user-visible perf win** of
the series: after this lands, the engine restores the FSM only when the
on-disk state is actually behind the snapshot pointer; the common case
(local restart with a fresh fsm.db) collapses to a header read + CRC
verification, skipping the multi-GiB body restore that PR #909's
`HEALTH_TIMEOUT_SECONDS=300` band-aid was sized to absorb.
## Reading order (6 commits, designed for one-at-a-time review)
| # | commit | scope |
|---|---|---|
| 1 | `8312f4ba` | `raftengine.SnapshotHeaderApplier` two-phase
interface (`ParseSnapshotHeader` + `ApplySnapshotHeader`) |
| 2 | `024eb43c` | `kvFSM` implements both phases + compile-time guard |
| 3 | `933dab86` | `applyHeaderStateOnSkip` (3-step CRC, mirrors
`openAndRestoreFSMSnapshot`) + `restoreSnapshotState` skip gate |
| 4 | (folded into #3) | `fsmAlreadyAtIndex` + skip-gate wiring |
| 5 | (next) | metrics + INFO log: `raftengine.ColdStartObserver` +
`monitoring.ColdStartMetrics` + `Registry.ColdStartObserver()` |
| 6 | `ce277e66` | 9 new tests (skip-gate behaviour ×5 + CRC failure
modes ×3 + kvFSM contract ×1) |
(Commit 5 was merged into the metrics commit; the actual git log is 5
commits.)
## What this does
`restoreSnapshotState` branches on `decideSkipOutcome`:
```go
decision, have := decideSkipOutcome(fsm, tok.Index)
reportColdStart(obs, logger, decision, tok.Index, have)
if decision == coldStartSkip {
return applyHeaderStateOnSkip(fsm, fsmSnapPath(fsmSnapDir, tok.Index), tok.CRC32C)
}
return openAndRestoreFSMSnapshot(...) // unchanged
```
`decideSkipOutcome` returns one of 5 outcomes:
| outcome | trigger | action |
|---|---|---|
| `skip` | `LastAppliedIndex >= snap.Index` | header-only path |
| `execute` | `LastAppliedIndex < snap.Index` | full restore |
| `fallback_not_reader` | FSM doesn't implement `AppliedIndexReader` |
full restore |
| `fallback_missing_meta` | meta key absent (pre-upgrade fsm.db) | full
restore |
| `fallback_read_err` | `LastAppliedIndex` returned an error | full
restore |
The three fallback paths preserve the strictly-additive guarantee from
the design doc §4: over-restoring is always strictly safer than skipping
incorrectly. Errors from `LastAppliedIndex` **do not abort cold start**
— they collapse to fallback.
## CRC verification (mirrors `openAndRestoreFSMSnapshot`)
`applyHeaderStateOnSkip` runs the same three-step safety contract as the
full restore path. Failure of any step propagates a typed error WITHOUT
mutating FSM state:
| Step | Check | Error |
|---|---|---|
| 1 | `info.Size() < fsmMinFileSize` | `ErrFSMSnapshotTooSmall` |
| 2 | `readFSMFooter` vs `tokenCRC` | `ErrFSMSnapshotTokenCRC` |
| 3 | full-body `crc32.TeeReader` vs footer | `ErrFSMSnapshotFileCRC` |
The FSM's `ApplySnapshotHeader` (HLC ceiling + Stage 8a cutover) only
fires after all three pass.
## Metrics + INFO log
`raftengine.ColdStartObserver` interface lives in the parent
`raftengine` package so monitoring can implement it without circular
import. `monitoring.Registry` gains `ColdStartObserver()`. The engine
receives it through `OpenConfig.ColdStartObserver` — nil disables
metrics, the skip itself still runs.
Two Prometheus series:
```
elastickv_fsm_cold_start_restore_total{outcome,fallback_reason}
elastickv_fsm_cold_start_applied_index_gap{outcome}
```
Plus a structured zap.Info log on every cold-start with fsm_applied /
snapshot_index / gap and (for fallback) the reason enum.
## Design constraints honoured
- **§4 strictly-additive fallback**: `decideSkipOutcome` collapses every
uncertainty to a fallback variant; `LastAppliedIndex` errors do NOT
abort cold start.
- **§5 two-phase seam (round-7)**: `ParseSnapshotHeader` reads the v1/v2
header from a caller-supplied reader and drains the rest;
`ApplySnapshotHeader` is pure assignment. Splits the parse from the
side-effect so the engine can verify the CRC between them.
- **§5 CRC mirroring (round-6)**: 3-step verification (size +
footer-vs-tokenCRC + full-body-CRC) exactly matches
`openAndRestoreFSMSnapshot`. The full-body CRC pass IS the slow part of
this PR (~6s for a 6 GiB FSM at 1 GiB/s SSD read) — but still strictly
cheaper than the restore it replaces, which also reads the file once AND
writes a temp Pebble database via `restoreBatchLoopInto`.
- **§9 observability**: three outcomes + gap label; the design's
`not_reporter` label is named `not_reader` in the actual impl (matching
the round-5 PR #915 rename).
- **Non-Goals respected**: `Engine.applySnapshot` (engine.go:1641
InstallSnapshot hot path) is untouched. The TODO sentinel added in PR
#915 round-6 still names B3 as the follow-up site for the
peer-after-InstallSnapshot bump — that's not in scope here either.
## Test results
```text
go vet ./internal/raftengine/etcd/ ./internal/raftengine/ ./kv/ ./store/ ./monitoring/ → exit 0
golangci-lint run ... (same packages) → 0 issues
go test ./internal/raftengine/etcd/ ./kv/ ./store/ ./monitoring/ -short → ok ~57s total
go test ./internal/raftengine/etcd/ -run 'TestSkipGate|TestApplyHeaderStateOnSkip' → ok 0.028s
```
## What this does NOT do
- **Does NOT change** `HEALTH_TIMEOUT_SECONDS=300`. That's Branch 4,
gated on production data showing steady-state skip rate ≥ 90%.
- **Does NOT** wire `Engine.applySnapshot` to populate
`metaAppliedIndex` for peers receiving InstallSnapshot. The TODO
sentinel from PR #915 round-6 still flags this; addressing it requires a
separate design pass.
- **Does NOT** add an idle-cluster integration test
(`ELASTICKV_RAFT_SNAPSHOT_COUNT=10` end-to-end). The unit tests cover
the gate decision matrix; an integration test would prove the codex
round-3 P2 scenario is closed end-to-end in production-like setup, which
is most valuable as a follow-up after B3 has soaked.
## Soak plan
Same as B2: deploy and observe
`elastickv_fsm_cold_start_restore_total{outcome="skipped"}` ramp up
across cold starts. The design target is steady-state skip rate ≥ 90%;
the per-outcome label distribution and gap distribution are the primary
signals.
After one release of soak, Branch 4 can lower `HEALTH_TIMEOUT_SECONDS`.
## Refs
- PR #910 — design doc (rounds 1-7)
- PR #915 — Branch 2 plumbing (merged); this PR depends on
`metaAppliedIndex` being populated
- PR #909 — `HEALTH_TIMEOUT_SECONDS` band-aid this series eventually
obviates
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Cold-start lifecycle callbacks added and wired through the raft engine
to report skip/execute/fallback outcomes and applied-index gaps.
* Prometheus metrics for cold-start outcomes and applied-index gap.
* Snapshot-header read/apply contract introduced; KV FSM implements
header apply and volatile-entry classification to support header-only
skips and volatile-only replay on duplicates.
* **Tests**
* Extensive unit/integration tests covering skip/execute/fallback
decisions, header verification, volatile duplicate replay, and updated
restore-call signatures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->19 files changed
Lines changed: 1618 additions & 69 deletions
File tree
- internal/raftengine
- etcd
- kv
- monitoring
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
Lines changed: 199 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
| 136 | + | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| |||
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | | - | |
| 154 | + | |
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
172 | | - | |
| 172 | + | |
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| |||
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
203 | | - | |
| 203 | + | |
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
| |||
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
229 | | - | |
| 229 | + | |
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| |||
258 | 258 | | |
259 | 259 | | |
260 | 260 | | |
261 | | - | |
| 261 | + | |
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
| |||
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
329 | | - | |
| 329 | + | |
330 | 330 | | |
331 | 331 | | |
332 | 332 | | |
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
| 338 | + | |
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
| |||
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | | - | |
| 368 | + | |
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
399 | | - | |
| 399 | + | |
400 | 400 | | |
401 | 401 | | |
402 | 402 | | |
| |||
0 commit comments