Commit 13a1b61
authored
feat(snapshot-skip B2): plumb metaAppliedIndex through raft-Apply + both snapshot persist sites (#915)
## Summary
Implements **Branch 2** of the cold-start snapshot-restore skip
optimisation designed in PR #910. After this lands the
`metaAppliedIndex` Pebble meta key is durably written on every
raft-Apply data mutation AND at every snapshot persist — but the skip
gate itself (Branch 3) is NOT yet wired, so behaviour is observationally
identical to `main` except for the new meta key in fsm.db. Branch 2 is
meant to soak in production for at least one release before Branch 3
enables the skip; this PR is intentionally a no-op-from-the-outside
change with comprehensive plumbing.
## Reading order (6 commits, designed to review one-at-a-time)
| # | commit | scope |
|---|---|---|
| 1 | `2339a6f2` | raftengine: opt-in interfaces (`AppliedIndexReader` /
`AppliedIndexWriter`) |
| 2 | `525fc152` | pebbleStore: `metaAppliedIndex` const +
`LastAppliedIndex` + `SetDurableAppliedIndex` (with `pebble.Sync`
UNCONDITIONALLY) |
| 3 | `aa9b8acc` | `MVCCStore` interface extension:
`ApplyMutationsRaftAt` / `DeletePrefixAtRaftAt` overloads, threading
appliedIndex through `applyMutationsWithOpts` + `deletePrefixAtWithOpts`
|
| 4 | `7cd72bda` | kvFSM seam wiring: `AppliedIndexReader()` /
`SetDurableAppliedIndex()` accessors + all 7 data-Apply leaves switched
to `*RaftAt` with `f.pendingApplyIdx` |
| 5 | `f1e8748c` | engine hooks at BOTH snapshot persist sites:
`persistCreatedSnapshot` + `e.persistLocalSnapshotPayload` call
`SetDurableAppliedIndex` BEFORE `persist.SaveSnap` |
| 6 | `2c42f7d6` | tests (10 new tests across store + engine) |
## Design constraints honoured
All from `docs/design/2026_06_02_idempotent_snapshot_restore.md`:
- **§2 "Why both leaves"**: meta key bundle in BOTH
`applyMutationsWithOpts` AND `deletePrefixAtWithOpts` so DEL_PREFIX
entries don't silently leave `LastAppliedIndex` behind. Tested by
`TestDeletePrefixAtRaftAt_BundlesMetaAppliedIndex`.
- **§3 `dbMu.RLock()`**: both `LastAppliedIndex` and
`SetDurableAppliedIndex` acquire the read-lock, matching the
lock-ordering discipline at `lsm_store.go:153 / :553 / :675`.
- **§4 fallback policy**: `AppliedIndexReader()` returns nil when the
store doesn't implement the seam; `LastAppliedIndex` returns `(0, false,
nil)` for missing OR truncated meta key. Branch 3 will then fall back to
full restore conservatively.
- **§6 `ELASTICKV_FSM_SYNC_MODE=nosync` mode**: `SetDurableAppliedIndex`
is **pinned to `pebble.Sync` unconditionally**. Rationale documented at
length in the method's doc-comment — once `persist.SaveSnap` returns,
WAL compaction discards every log entry ≤ `snap.Metadata.Index`, so
there's no source to replay the meta key bump from. +1 fsync per
snapshot persist (rare; default `SnapshotCount=10000`). Tested by
`TestSetDurableAppliedIndex_UsesPebbleSync`.
- **§6 "HLC lease entries — checkpoint at snapshot persist"**: BOTH
`persistCreatedSnapshot` (config snapshots) AND
`e.persistLocalSnapshotPayload` (steady-state `SnapshotCount`-triggered
hot path) call the hook. Both crash-ordering tested by
`TestPersistCreatedSnapshot_*`.
- **§8 compatibility**: `StateMachine.Apply`'s public signature is
unchanged. New interfaces are opt-in. Old call sites
(`ApplyMutationsRaft` without `*At`) still work, just pass
`appliedIndex=0` to opt out of the meta key bump.
## Test results
```
go vet ./... → 0 issues
go test ./store/ -short → ok 29.4s
go test ./kv/ -short → ok 10.4s
go test ./internal/raftengine/... -short → ok 32.8s
go test ./store/ -run 'TestLastAppliedIndex|TestSetDurable...|TestApply...|TestDelete...' → ok 1.6s
go test ./internal/raftengine/etcd/ -run 'TestRecording|TestPersistCreatedSnapshot_' → ok 0.03s
```
10 new tests added (see commit `2c42f7d6` for the full inventory).
## What this does NOT do
- **Does NOT enable the skip gate.** `restoreSnapshotState` still always
restores. Branch 3 wires the `fsmAlreadyAtIndex` check +
`applyHeaderStateOnSkip` + the two-phase `SnapshotHeaderApplier` seam.
- **Does NOT change `HEALTH_TIMEOUT_SECONDS=300`.** Branch 4 lowers it
once Branch 3 has soaked.
- **Does NOT touch the snapshot-install hot path**
(`Engine.applySnapshot`) per Non-Goals in the design.
## Soak plan
Branch 2 should run in production for at least one release before Branch
3 opens. Operators can verify the meta key is being written via:
```bash
# Inspect a pebble fsm.db (read-only)
ldb --db=/var/lib/elastickv/n3/fsm.db get '_meta_applied_index' --hex
# Expected: 8 little-endian bytes equal to the current applied index
```
## Refs
- PR #910 (design) — round 1..7 design history + retraction sections
explaining the design constraints this PR honours
- PR #909 — `HEALTH_TIMEOUT_SECONDS` band-aid that this series
eventually obviates
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Durable tracking of Raft-applied indexes to ensure consistent
snapshot/save ordering.
* **Bug Fixes**
* Improved snapshot persistence reliability by pinning durable applied
index before snapshot writes.
* Stronger durability for writes bundled with Raft entry indices,
reducing restore/recovery surprises.
* **Tests**
* Added comprehensive tests covering applied-index ordering, failure
handling, and persistence behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->13 files changed
Lines changed: 1127 additions & 49 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
239 | | - | |
240 | | - | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
241 | 246 | | |
242 | 247 | | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
249 | 253 | | |
250 | | - | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
251 | 259 | | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
252 | 264 | | |
253 | 265 | | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
254 | 270 | | |
255 | 271 | | |
256 | 272 | | |
| |||
279 | 295 | | |
280 | 296 | | |
281 | 297 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
286 | 308 | | |
287 | | - | |
| 309 | + | |
288 | 310 | | |
289 | | - | |
290 | | - | |
291 | | - | |
| 311 | + | |
292 | 312 | | |
293 | 313 | | |
294 | 314 | | |
| |||
814 | 834 | | |
815 | 835 | | |
816 | 836 | | |
817 | | - | |
| 837 | + | |
818 | 838 | | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
819 | 843 | | |
820 | 844 | | |
821 | 845 | | |
| |||
863 | 887 | | |
864 | 888 | | |
865 | 889 | | |
866 | | - | |
| 890 | + | |
867 | 891 | | |
868 | 892 | | |
869 | 893 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
61 | | - | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2026 | 2026 | | |
2027 | 2027 | | |
2028 | 2028 | | |
| 2029 | + | |
| 2030 | + | |
| 2031 | + | |
| 2032 | + | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
2029 | 2038 | | |
2030 | 2039 | | |
2031 | 2040 | | |
| |||
2704 | 2713 | | |
2705 | 2714 | | |
2706 | 2715 | | |
| 2716 | + | |
| 2717 | + | |
| 2718 | + | |
| 2719 | + | |
| 2720 | + | |
| 2721 | + | |
| 2722 | + | |
| 2723 | + | |
| 2724 | + | |
| 2725 | + | |
| 2726 | + | |
| 2727 | + | |
| 2728 | + | |
| 2729 | + | |
| 2730 | + | |
| 2731 | + | |
| 2732 | + | |
| 2733 | + | |
| 2734 | + | |
| 2735 | + | |
| 2736 | + | |
| 2737 | + | |
| 2738 | + | |
| 2739 | + | |
| 2740 | + | |
2707 | 2741 | | |
2708 | 2742 | | |
2709 | 2743 | | |
2710 | 2744 | | |
| 2745 | + | |
| 2746 | + | |
| 2747 | + | |
2711 | 2748 | | |
2712 | 2749 | | |
2713 | 2750 | | |
| |||
4072 | 4109 | | |
4073 | 4110 | | |
4074 | 4111 | | |
| 4112 | + | |
| 4113 | + | |
| 4114 | + | |
| 4115 | + | |
4075 | 4116 | | |
| 4117 | + | |
| 4118 | + | |
| 4119 | + | |
| 4120 | + | |
| 4121 | + | |
| 4122 | + | |
| 4123 | + | |
| 4124 | + | |
| 4125 | + | |
| 4126 | + | |
4076 | 4127 | | |
4077 | 4128 | | |
4078 | | - | |
4079 | | - | |
4080 | | - | |
4081 | | - | |
4082 | | - | |
4083 | | - | |
4084 | | - | |
4085 | | - | |
4086 | | - | |
4087 | | - | |
4088 | | - | |
4089 | | - | |
| 4129 | + | |
4090 | 4130 | | |
4091 | | - | |
| 4131 | + | |
| 4132 | + | |
| 4133 | + | |
4092 | 4134 | | |
4093 | 4135 | | |
4094 | 4136 | | |
4095 | 4137 | | |
4096 | 4138 | | |
4097 | 4139 | | |
| 4140 | + | |
| 4141 | + | |
| 4142 | + | |
| 4143 | + | |
| 4144 | + | |
| 4145 | + | |
| 4146 | + | |
| 4147 | + | |
| 4148 | + | |
| 4149 | + | |
| 4150 | + | |
| 4151 | + | |
| 4152 | + | |
| 4153 | + | |
| 4154 | + | |
4098 | 4155 | | |
4099 | 4156 | | |
4100 | 4157 | | |
| |||
0 commit comments