Commit 7a05466
authored
docs(encryption): Stage 6D-1 proposed design — enable-storage-envelope cutover (#786)
## Summary
Doc-only PR. Stage 6D-1 per the [parent design's milestone
table](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_29_partial_data_at_rest_encryption.md),
now unblocked by PR #784 (6C-2d).
Per `CLAUDE.md`'s design-doc-first workflow: this doc lands BEFORE any
6D implementation so the decomposition, wire-format choice, RPC
contract, capability fan-out helper, storage-layer toggle, bundled 6C-3
startup guards, and refusal posture can be reviewed as one
self-contained record.
## What this ships
`docs/design/2026_05_18_proposed_6d_enable_storage_envelope.md` — the
Stage 6D as-implemented design. The parent design covers 6D abstractly
in §6.6 + §7.1; this doc pins the concrete sub-PR decomposition,
wire-format byte choice, and refusal posture.
## Sub-PR decomposition
| Sub-PR | Surface | Operator-visible? |
|---|---|---|
| **6D-1** | This design doc | No (doc-only) |
| **6D-2** | 6C-3 startup guards (`ErrNodeIDCollision` +
`ErrLocalEpochRollback`) | No (startup-time guards) |
| **6D-3** | Capability fan-out helper in `internal/admin/` | No
(library) |
| **6D-4** | Wire format: `RotateSubEnableStorageEnvelope = 0x01` +
`ApplyRotation` sub-tag dispatch | No (writes only flag in sidecar) |
| **6D-5** | §6.2 storage-layer toggle: `PutAt` reads
`StorageEnvelopeActive` | No (toggle stays false until 6D-6) |
| **6D-6** | `EnableStorageEnvelope` admin RPC + CLI + integration test
| **Yes** — first user-visible cutover |
Each sub-PR is independently revertable. 6D-2 through 6D-5 land
operator-inert (the cutover isn't reachable from any CLI until 6D-6
wires it).
## Highlights
- **Wire format**: new sub-tag `RotateSubEnableStorageEnvelope = 0x01`
on the existing 0x05 OpRotation payload. No new opcode. Validation at
both propose AND apply time: `Purpose == PurposeStorage`, `Wrapped ==
nil`, `DEKID == sidecar.Active.Storage`, `sidecar.StorageEnvelopeActive
== false` (idempotency guard for retried RPCs).
- **RPC**: leader-only `EncryptionAdmin.EnableStorageEnvelope`.
Followers return `FailedPrecondition` with leader hint (same as
`RotateDEK`). Failure-mode table in §8 lists five new typed sentinels
(`ErrEncryptionNotBootstrapped`, `ErrStorageEnvelopeAlreadyActive`,
`ErrCapabilityCheckFailed`, plus existing
`ErrEncryptionMutatorsDisabled` + `ErrNotLeader`).
- **Voters ∪ Learners fan-out** in
`internal/admin/capability_fanout.go`. Fresh probing every call (no
caching, per the parent design's "stale cached capability state cannot
trigger a premature cutover" rationale). Default timeout = one
`RaftElectionTimeout`.
- **6C-3 guards bundled into 6D-2**:
- `ErrNodeIDCollision` — refuses startup if two members hash to the same
16-bit `node_id` (catches GCM nonce-reuse risk).
- `ErrLocalEpochRollback` — refuses startup if sidecar's `local_epoch`
is strictly less than the writer-registry record (catches
sidecar-restored-from-old-backup).
- **§6.2 storage-layer toggle**: `store/mvcc_store.go::Put` reads
`StorageEnvelopeActive` at write time. Pre-cutover versions stay
cleartext; post-cutover writes are encrypted with `encryption_state =
0b01`; mixed versions within a single key are correct by construction
per §5.4 of the parent doc.
## Five-lens self-review of THIS doc
1. **Data loss** — net-neutral. Cutover doesn't retire DEKs or delete
values.
2. **Concurrency** — apply is deterministic; every replica flips
`StorageEnvelopeActive` at the same apply index. Pre-flight fan-out is
fresh per cutover.
3. **Performance** — per-Put adds one AES-GCM seal + 16-byte tag +
1-byte header (already benchmarked).
4. **Data consistency** — cutover entry's apply index is recorded in
`RaftAppliedIndex` via the 6C-2d advancement; §9.1 guard then refuses
startup if a later restart's sidecar is missing the cutover entry.
5. **Test coverage** — five new unit-test surfaces (one per sub-PR) +
Jepsen extension at 6D-6.
## Open questions for the reviewer (§11)
1. **Sub-tag byte choice**: `0x01` (natural next value) vs `0x10` (group
"cutover" sub-tags numerically away from "rotate" sub-tags). Doc picks
`0x01`.
2. **Idempotency response code**: `AlreadyExists` (different CLI
message) vs `OK` (same end state). Doc picks `AlreadyExists`.
3. **Fan-out helper location**: `internal/admin/` (matches existing
`config.go` precedent) vs `internal/encryption/`. Doc picks
`internal/admin/` because Stage 6E + the 6C-3 startup guard both reuse
it.
If the reviewer disagrees with any of these, the discussion lives on
this PR; subsequent implementation PRs will follow whatever this PR's
review decides.
## Test plan
- [x] Doc renders cleanly on GitHub
- [x] Cross-references to parent design + 6C-2d PR #784 are valid
- [ ] No code changes; CI is doc-only
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added design documentation describing an end-to-end cutover to enable
cluster-wide storage envelope encryption. Covers leader-driven
activation, pre-flight capability checks, idempotent retry semantics, a
recorded cutover index for consistency, write-path behavior when
encryption is active, failure/refusal modes, and a comprehensive testing
strategy (unit/property/Jepsen).
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/786?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->1 file changed
Lines changed: 1009 additions & 0 deletions
0 commit comments