Skip to content

Commit 7a05466

Browse files
authored
docs(encryption): Stage 6D-1 proposed design — enable-storage-envelope cutover (#786)
## Summary Doc-only PR. Stage 6D-1 per the [parent design's milestone table](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_29_partial_data_at_rest_encryption.md), now unblocked by PR #784 (6C-2d). Per `CLAUDE.md`'s design-doc-first workflow: this doc lands BEFORE any 6D implementation so the decomposition, wire-format choice, RPC contract, capability fan-out helper, storage-layer toggle, bundled 6C-3 startup guards, and refusal posture can be reviewed as one self-contained record. ## What this ships `docs/design/2026_05_18_proposed_6d_enable_storage_envelope.md` — the Stage 6D as-implemented design. The parent design covers 6D abstractly in §6.6 + §7.1; this doc pins the concrete sub-PR decomposition, wire-format byte choice, and refusal posture. ## Sub-PR decomposition | Sub-PR | Surface | Operator-visible? | |---|---|---| | **6D-1** | This design doc | No (doc-only) | | **6D-2** | 6C-3 startup guards (`ErrNodeIDCollision` + `ErrLocalEpochRollback`) | No (startup-time guards) | | **6D-3** | Capability fan-out helper in `internal/admin/` | No (library) | | **6D-4** | Wire format: `RotateSubEnableStorageEnvelope = 0x01` + `ApplyRotation` sub-tag dispatch | No (writes only flag in sidecar) | | **6D-5** | §6.2 storage-layer toggle: `PutAt` reads `StorageEnvelopeActive` | No (toggle stays false until 6D-6) | | **6D-6** | `EnableStorageEnvelope` admin RPC + CLI + integration test | **Yes** — first user-visible cutover | Each sub-PR is independently revertable. 6D-2 through 6D-5 land operator-inert (the cutover isn't reachable from any CLI until 6D-6 wires it). ## Highlights - **Wire format**: new sub-tag `RotateSubEnableStorageEnvelope = 0x01` on the existing 0x05 OpRotation payload. No new opcode. Validation at both propose AND apply time: `Purpose == PurposeStorage`, `Wrapped == nil`, `DEKID == sidecar.Active.Storage`, `sidecar.StorageEnvelopeActive == false` (idempotency guard for retried RPCs). - **RPC**: leader-only `EncryptionAdmin.EnableStorageEnvelope`. Followers return `FailedPrecondition` with leader hint (same as `RotateDEK`). Failure-mode table in §8 lists five new typed sentinels (`ErrEncryptionNotBootstrapped`, `ErrStorageEnvelopeAlreadyActive`, `ErrCapabilityCheckFailed`, plus existing `ErrEncryptionMutatorsDisabled` + `ErrNotLeader`). - **Voters ∪ Learners fan-out** in `internal/admin/capability_fanout.go`. Fresh probing every call (no caching, per the parent design's "stale cached capability state cannot trigger a premature cutover" rationale). Default timeout = one `RaftElectionTimeout`. - **6C-3 guards bundled into 6D-2**: - `ErrNodeIDCollision` — refuses startup if two members hash to the same 16-bit `node_id` (catches GCM nonce-reuse risk). - `ErrLocalEpochRollback` — refuses startup if sidecar's `local_epoch` is strictly less than the writer-registry record (catches sidecar-restored-from-old-backup). - **§6.2 storage-layer toggle**: `store/mvcc_store.go::Put` reads `StorageEnvelopeActive` at write time. Pre-cutover versions stay cleartext; post-cutover writes are encrypted with `encryption_state = 0b01`; mixed versions within a single key are correct by construction per §5.4 of the parent doc. ## Five-lens self-review of THIS doc 1. **Data loss** — net-neutral. Cutover doesn't retire DEKs or delete values. 2. **Concurrency** — apply is deterministic; every replica flips `StorageEnvelopeActive` at the same apply index. Pre-flight fan-out is fresh per cutover. 3. **Performance** — per-Put adds one AES-GCM seal + 16-byte tag + 1-byte header (already benchmarked). 4. **Data consistency** — cutover entry's apply index is recorded in `RaftAppliedIndex` via the 6C-2d advancement; §9.1 guard then refuses startup if a later restart's sidecar is missing the cutover entry. 5. **Test coverage** — five new unit-test surfaces (one per sub-PR) + Jepsen extension at 6D-6. ## Open questions for the reviewer (§11) 1. **Sub-tag byte choice**: `0x01` (natural next value) vs `0x10` (group "cutover" sub-tags numerically away from "rotate" sub-tags). Doc picks `0x01`. 2. **Idempotency response code**: `AlreadyExists` (different CLI message) vs `OK` (same end state). Doc picks `AlreadyExists`. 3. **Fan-out helper location**: `internal/admin/` (matches existing `config.go` precedent) vs `internal/encryption/`. Doc picks `internal/admin/` because Stage 6E + the 6C-3 startup guard both reuse it. If the reviewer disagrees with any of these, the discussion lives on this PR; subsequent implementation PRs will follow whatever this PR's review decides. ## Test plan - [x] Doc renders cleanly on GitHub - [x] Cross-references to parent design + 6C-2d PR #784 are valid - [ ] No code changes; CI is doc-only <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added design documentation describing an end-to-end cutover to enable cluster-wide storage envelope encryption. Covers leader-driven activation, pre-flight capability checks, idempotent retry semantics, a recorded cutover index for consistency, write-path behavior when encryption is active, failure/refusal modes, and a comprehensive testing strategy (unit/property/Jepsen). <!-- review_stack_entry_start --> [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/786?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2 parents 87b6f2a + a990e7a commit 7a05466

1 file changed

Lines changed: 1009 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)