Commit 6350e64
authored
feat(main): Stage 6C-2d — wire ErrSidecarBehindRaftLog startup guard into main.go (#784)
## Summary
Stage 6C-2d per the [PR #762
plan](#762) and the 6C
sub-decomposition landed in PR #781 / #782 / #783.
Ties the encryption-side guard primitive ([PR #782
`GuardSidecarBehindRaftLog`](#782))
to the raftengine-side scanner ([PR #783
`Engine.EncryptionScanner()`](#783))
in `main.go`'s startup phase. **Completes the §9.1
`ErrSidecarBehindRaftLog` work end-to-end.**
## What this PR ships
### `main_encryption_startup_guard.go` (new)
| Function | Role |
|---|---|
| `checkSidecarBehindRaftLog(runtimes, defaultGroup, sidecarPath,
encryptionEnabled)` | Locates the default-group runtime, type-asserts
its engine to local `encryptionGapEngine` interface, reads the §5.1
sidecar, delegates to the inner helper |
| `runSidecarBehindRaftLogGuard(gapEngine, sidecarPath, defaultGroup)` |
Per-engine inner half. Split out so tests use a stub instead of
constructing a full `raftengine.Engine` |
| `chainEncryptionStartupGuard(prevErr, ...)` | Composes the guard with
the existing `buildShardGroups` error path. Keeps `run()`'s cyclop
budget intact |
### Lifecycle phase ordering
```
1. loadKEKAndRunStartupGuards ← 6B-2 RPC gate + 6C-1/6C-2 flag/sidecar/exhaustion
2. buildShardGroups ← opens every shard's engine
3. chainEncryptionStartupGuard ← 6C-2d gap-coverage guard (NEW)
4. gRPC servers / accept loop
```
Step 3 runs in a **different lifecycle phase** from steps 1 — the guard
requires the engine to be open and its applied index populated from
WAL+snapshot replay.
### Skip conditions (`checkSidecarBehindRaftLog` returns nil)
- `encryptionEnabled` is false — no encryption opt-in, no gap to refuse
on
- `sidecarPath` is empty — operator hasn't configured a sidecar location
- On-disk sidecar file absent — bootstrap hasn't committed yet
- Default-group runtime missing — no engine to query
- Engine not open — earlier startup-guard layer would have caught this
### Why scope is restricted to the default group
The §5.1 sidecar tracks a **single** `raft_applied_index`, advanced only
by `WriteSidecar` calls in `ApplyBootstrap` / `ApplyRotation` on the
default group's FSM. Other shards' Raft logs don't carry encryption FSM
entries — running the guard against them would always report caught-up
(gap == 0). `findDefaultGroupRuntime` silently skips non-matching
runtimes.
## Test coverage (11 new tests)
| Test | Path |
|---|---|
| `_DisabledNoop` | `encryptionEnabled=false` → skip |
| `_NoSidecarPathNoop` | empty path → skip |
| `_SidecarAbsentNoop` | on-disk file missing → skip |
| `_SidecarStatError` | I/O error (NUL path) → wrapped, NOT `not-exist`
|
| `_NoRuntimes` | no default-group runtime → skip |
| `_CaughtUp` | sidecar idx >= engine idx → pass, scanner never
consulted |
| `_GapNotCovered` | gap exists, scanner says no → pass |
| `_GapCovered` | gap covers relevant entry → `ErrSidecarBehindRaftLog`
|
| `_ScannerError` | scanner error → propagated wrapped, NOT
`ErrSidecarBehindRaftLog` |
| `_PropagatesPrevError` | chain with non-nil prev → returns prev
verbatim |
| `_NilPrevRunsGuard` | chain with nil prev → forwards to guard |
Tests use `stubGapEngine` + `stubScanner` to exercise the per-engine
inner function directly, avoiding the full `raftengine.Engine` interface
(which would require panic-stubs for ~30 unrelated methods).
## Posture matrix (before/after this PR)
| Scenario | pre-this-PR | post-this-PR |
|---|---|---|
| Sidecar caught up to engine | boot, encrypted | boot, encrypted |
| Sidecar behind, gap harmless (no §5.5 opcodes) | boot, encrypted |
boot, encrypted |
| Sidecar behind, gap covers `OpBootstrap` / `OpRotation` | **boot,
HaltApply on first post-cutover read with `unknown_key_id`** | refuse:
`ErrSidecarBehindRaftLog` |
| Sidecar behind, gap covers only `OpRegistration` | boot, registry
replays cleanly | boot (PR #782 narrowed predicate excludes 0x03) |
## Caller audit (per cron directive)
No public function signatures changed. The new functions in
`main_encryption_startup_guard.go` are package-internal. `main.go run()`
is the sole production caller; tests call the inner helpers directly.
The semantic effect of the wiring: nodes whose sidecar is behind by an
encryption-relevant interval now refuse at startup instead of silently
booting into a HaltApply later. This is a tighter failure boundary —
operators see the §9.1 typed refusal pointing at `encryption
resync-sidecar` rather than triaging downstream `unknown_key_id` errors.
## Stage 6B-2 / 6C-1 / 6C-2 / 6C-2b / 6C-2c invariants preserved
All earlier guards and their lifecycle phases unchanged. This PR
composes existing primitives — no new sentinel, no new opcode, no new
wire format.
## Five-lens self-review
1. **Data loss** — net-positive. Catches the partial-write-crash failure
mode between an encryption-relevant Raft commit and the sidecar update.
The guard primitive existed since PR #782 but was operator-inert until
this wiring.
2. **Concurrency / distributed failures** — runs at startup before any
goroutine fan-out / gRPC accept. No shared state introduced.
3. **Performance** — single sidecar read + single scanner call per
process start on the happy path.
4. **Data consistency** — the only §9.1 mechanism that catches "Raft
committed but sidecar missed it" partial writes. Without it the first
post-cutover read would HaltApply.
5. **Test coverage** — 11 new tests cover every branch. Full main
package test suite passes with no regressions.
## Test plan
- [x] `go test -race -timeout=180s .` — PASS (all main package tests +
11 new)
- [x] `go test -race -timeout=120s ./internal/raftengine/etcd/...
./internal/encryption/...` — PASS (no regressions)
- [x] `go build ./...` — PASS
- [x] `golangci-lint run .` — 0 issues (cyclop holds at 10 via the
composition helper)
- [ ] Full Jepsen suite — not run; the guard is purely a startup refusal
that runs before any Raft entry can be served
## Plan
**Stage 6C is now complete** (6C-1 ✅ + 6C-2 ✅ + 6C-2b ✅ + 6C-2c ✅ +
6C-2d this PR).
Next stages per PR #762:
- **6C-3** — `ErrNodeIDCollision` + `ErrLocalEpochRollback` (bundles
with Stage 6D's capability fan-out)
- **6D** — `enable-storage-envelope` admin RPC + §7.1 Phase-1 cutover
(**now UNBLOCKED** — all `ErrSidecarBehindRaftLog` work has shipped)
- **6E** — `enable-raft-envelope` admin RPC + §7.1 Phase-2 cutover
(bundles 6C-4)
- **6F** — `--encryption-rotate-on-startup` ergonomics
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added encryption startup validation to ensure sidecar state is
synchronized with the raft log during system initialization, preventing
startup if consistency checks fail.
* **Tests**
* Comprehensive test coverage for encryption startup guard behavior,
including fast-skip scenarios, gap detection, and error propagation.
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/784?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->11 files changed
Lines changed: 869 additions & 53 deletions
File tree
- docs/design
- internal
- encryption
- raftengine
- etcd
- kv
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
332 | 332 | | |
333 | 333 | | |
334 | 334 | | |
335 | | - | |
| 335 | + | |
336 | 336 | | |
337 | 337 | | |
338 | 338 | | |
| |||
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
353 | | - | |
| 353 | + | |
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
| |||
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
486 | | - | |
| 486 | + | |
487 | 487 | | |
488 | 488 | | |
489 | 489 | | |
| |||
496 | 496 | | |
497 | 497 | | |
498 | 498 | | |
| 499 | + | |
499 | 500 | | |
500 | 501 | | |
501 | 502 | | |
| |||
515 | 516 | | |
516 | 517 | | |
517 | 518 | | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
518 | 551 | | |
519 | 552 | | |
520 | 553 | | |
| |||
537 | 570 | | |
538 | 571 | | |
539 | 572 | | |
540 | | - | |
| 573 | + | |
541 | 574 | | |
542 | 575 | | |
543 | 576 | | |
| |||
564 | 597 | | |
565 | 598 | | |
566 | 599 | | |
567 | | - | |
| 600 | + | |
568 | 601 | | |
569 | 602 | | |
570 | 603 | | |
| |||
578 | 611 | | |
579 | 612 | | |
580 | 613 | | |
581 | | - | |
| 614 | + | |
582 | 615 | | |
583 | 616 | | |
584 | 617 | | |
| |||
596 | 629 | | |
597 | 630 | | |
598 | 631 | | |
| 632 | + | |
599 | 633 | | |
600 | 634 | | |
601 | 635 | | |
| |||
0 commit comments