Skip to content

Commit 0a82420

Browse files
committed
docs(encryption): address PR707 round-7 (off-by-one, peer_metadata.go reference, post-cutover gate)
Critical (claude[bot] r5 #1): FSM Apply pseudocode used "if index >= raft_envelope_cutover_index" which would attempt to unwrap the enable-raft-envelope flag entry itself (written in legacy framing per step 7) and fail. Change to strict ">" so the flag entry is processed in legacy mode -- which is what records cutover_index in the sidecar -- and only subsequent entries get the raft envelope unwrap. Added an inline comment in the pseudocode explaining the off-by-one risk. Medium (claude[bot] r5 #3): retracted the wrong reference to internal/raftengine/etcd/peer_metadata.go (that file is the etcd-raft-peers.bin persistence layer, not a heartbeat extension). Replaced with an out-of-band proto.EncryptionAdmin.GetCapability gRPC fanned-out by the cutover commands; reuses the existing admin-listener TLS pool. §6.1 gains an admin.go entry for the new service. Minor (claude[bot] r5 #4): post-cutover membership gate. Without it, an operator adding a new voter on an old binary post-cutover would silently drop encrypted reads (data[0] != 0 read as tombstone) and fail every Phase-2 raft envelope apply. Add an explicit §7.1 sub-section: control plane refuses AddVoter / ConfChange / learner promotion for any peer whose GetCapability does not return encryption_capable=true once the cluster flags are active. ErrPeerNotEncryptionCapable surfaced at the admin RPC.
1 parent e887e9f commit 0a82420

1 file changed

Lines changed: 89 additions & 24 deletions

File tree

docs/design/2026_04_29_proposed_data_at_rest_encryption.md

Lines changed: 89 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,15 @@ rewrite."
783783
`os.Rename``dir.Sync()` → ack). Skipping any of these
784784
fsyncs is a data-loss-class bug per §10 lens 1; do not
785785
shortcut to a plain `os.Rename`.
786+
- `admin.go` — gRPC server for the new `proto.EncryptionAdmin`
787+
service. v1 surface: `GetCapability(Empty) → CapabilityReport
788+
{ encryption_capable bool, build_sha string,
789+
sidecar_present bool }`. Used by the §7.1 cutover commands
790+
to fan-out a fresh capability poll across every voting
791+
member of every Raft group; reuses the existing admin-listener
792+
TLS configuration. The RPC is read-only and side-effect-free,
793+
so it can be served unconditionally on every node regardless
794+
of leadership state.
786795

787796
### 6.2 Hooks into the storage layer
788797

@@ -1007,31 +1016,43 @@ flag becomes a *capability* assertion, not a *behaviour* trigger.
10071016
envelope is gated on the cluster flag below) and proposes
10081017
cleartext Raft entries (raft envelope is gated on a separate
10091018
flag in Phase 2).
1010-
2. Each upgraded node advertises `encryption_capable = true` in
1011-
the **peer-metadata extension** that today already piggy-backs
1012-
on the etcd-raft heartbeat path
1013-
(`internal/raftengine/etcd/peer_metadata.go`). A new
1014-
`encryption_capable bool` field is added to the existing
1015-
`PeerMetadata` proto; old binaries omit the field, which
1016-
protobuf decodes as the zero value (`false`), so unscanned
1017-
non-upgraded peers default to `not capable` without any
1018-
special handling. The flag is purely informational at this
1019-
point.
1019+
2. Each upgraded node advertises `encryption_capable = true`
1020+
via an **out-of-band gRPC** rather than by piggybacking on
1021+
the raft heartbeat path. (An earlier draft of this doc
1022+
referenced `internal/raftengine/etcd/peer_metadata.go` as
1023+
the extension point; that file is the on-disk peers-file
1024+
persistence layer for `etcd-raft-peers.bin`, not a
1025+
heartbeat extension. The reference has been retracted.) The
1026+
capability advert lives on a new
1027+
`EncryptionAdmin.GetCapability` unary RPC served by every
1028+
node (`proto.EncryptionAdmin`, see §6.1). The RPC returns a
1029+
`CapabilityReport { encryption_capable bool, build_sha
1030+
string, sidecar_present bool }` so the operator can
1031+
distinguish "binary supports encryption but flag not yet
1032+
set" from "binary is too old to support encryption at all".
10201033
3. The Phase-1 and Phase-2 cutover commands
10211034
(`enable-storage-envelope`, `enable-raft-envelope`) **do not**
1022-
read cached `encryption_capable` state from the local
1023-
keystore. Instead they force a fresh poll: the leader sends a
1024-
heartbeat to every voting member of every Raft group and
1025-
blocks until it has received a heartbeat response carrying
1026-
`encryption_capable = true` from every member, all within one
1027-
election timeout (configurable; default the existing
1035+
read cached capability state. Instead they force a fresh
1036+
poll: the leader of the default Raft group fans out a
1037+
`GetCapability` call to every voting member of every Raft
1038+
group in the route catalog and blocks until it has received
1039+
`encryption_capable = true` from every member, all within
1040+
one election timeout (configurable; default the existing
10281041
`RaftElectionTimeout`). If any member is unreachable or
10291042
reports `false` within the window, the command returns
10301043
`ErrCapabilityCheckFailed` with the offending node IDs and
10311044
the operator must re-run after fixing them. Stale cached
10321045
capability state therefore cannot trigger a premature
10331046
cutover.
10341047

1048+
The fan-out reuses the existing admin connection pool
1049+
(already TLS-authenticated for `--adminTLSCertFile` setups)
1050+
so no new transport machinery is required. Choosing the
1051+
out-of-band RPC over a heartbeat-context extension keeps
1052+
the cutover decision a control-plane operation rather than
1053+
a steady-state hot-path payload, and avoids changing the
1054+
etcd-raft message wire format.
1055+
10351056
**Phase 1 — Storage envelope cluster-flag flip.**
10361057

10371058
4. Operator runs `elastickv-admin encryption enable-storage-envelope`.
@@ -1062,17 +1083,27 @@ flag becomes a *capability* assertion, not a *behaviour* trigger.
10621083
`raft_envelope_active = true`. This entry itself uses the
10631084
legacy first-byte tag space so it remains decodable by any
10641085
replica that has applied entries up to it.
1065-
8. From the apply index of that flag entry onward, every leader
1066-
wraps new proposal `Data []byte` with the raft DEK (§4.2).
1067-
**There is no in-band format tag in the proposal payload.**
1068-
Replicas dispatch on the **Raft log index** of the entry
1069-
relative to the persisted `raft_envelope_cutover_index` (the
1070-
apply index of the flag entry, recorded in the local sidecar
1071-
on apply). Concretely, the FSM apply path becomes:
1086+
8. From the **next** entry after the flag entry's apply index
1087+
onward, every leader wraps new proposal `Data []byte` with
1088+
the raft DEK (§4.2). The flag entry itself is **not**
1089+
wrapped — step 7 wrote it in legacy first-byte framing so
1090+
every replica (including a defensive non-upgraded one) can
1091+
still apply it. **There is no in-band format tag in the
1092+
proposal payload.** Replicas dispatch on the **Raft log
1093+
index** of the entry relative to the persisted
1094+
`raft_envelope_cutover_index` (the apply index of the flag
1095+
entry, recorded in the local sidecar on apply). Concretely,
1096+
the FSM apply path becomes:
10721097

10731098
```text
10741099
Apply(index, data):
1075-
if index >= raft_envelope_cutover_index:
1100+
// Strict greater-than: the flag entry at index ==
1101+
// raft_envelope_cutover_index is itself in legacy framing.
1102+
// Unwrapping it as a raft envelope would fail GCM
1103+
// verification and either error the apply or, on
1104+
// best-effort handlers, silently drop the cutover update
1105+
// and leave the local sidecar in Phase 1 forever.
1106+
if index > raft_envelope_cutover_index:
10761107
data = raftDEK.Unwrap(data) // strip §4.2 envelope
10771108
// existing dispatch on data[0], unchanged from today:
10781109
switch data[0] {
@@ -1118,6 +1149,40 @@ only encryption-state-`0b01` MVCC versions plus a Phase-2-flagged
11181149
header so the receiver knows to expect raft envelopes from there
11191150
on.
11201151

1152+
#### Post-cutover membership gate
1153+
1154+
The Phase 0/1/2 capability check guards the **initial** rollout,
1155+
but operators routinely add or replace voters after the cluster is
1156+
in production. A new voter brought up on an old binary without
1157+
`--encryption-enabled` would, post-cutover, hit two distinct
1158+
silent failure modes:
1159+
1160+
- *At the storage layer:* MVCC versions arriving via Raft
1161+
replication or snapshot ingest carry `encryption_state = 0b01`
1162+
in `data[0]`. The old binary reads `data[0] != 0` as a
1163+
tombstone and silently drops every live value from reads.
1164+
- *At the raft layer:* every Phase-2 entry is a raft envelope.
1165+
Apply on the new voter fails immediately and the voter never
1166+
catches up.
1167+
1168+
Both modes happen with no operator-visible error until clients
1169+
start seeing missing data on reads routed to the new node.
1170+
1171+
To prevent this, the control plane (the leader of the default
1172+
Raft group) **rejects `AddVoter` / `ConfChange` / learner
1173+
promotion** for any peer whose `EncryptionAdmin.GetCapability`
1174+
RPC does not return `encryption_capable = true`, once
1175+
`storage_envelope_active` or `raft_envelope_active` has been
1176+
applied locally. The `Distribution` admin RPC layer surfaces this
1177+
as `ErrPeerNotEncryptionCapable` so the operator sees a clear
1178+
refusal at the moment they issue the membership change, not on
1179+
the first failed read.
1180+
1181+
The same gate applies to learners promoted to voters: a learner
1182+
that joined before cutover but never restarted with the
1183+
encryption flag must be restarted (or `EncryptionAdmin.GetCapability`
1184+
must report capable) before promotion succeeds.
1185+
11211186
### 7.2 Why we will not support encrypted → cleartext
11221187

11231188
Toggling encryption off would require rewriting every value while

0 commit comments

Comments
 (0)