@@ -783,6 +783,15 @@ rewrite."
783783 ` os.Rename ` → ` dir.Sync() ` → ack). Skipping any of these
784784 fsyncs is a data-loss-class bug per §10 lens 1; do not
785785 shortcut to a plain ` os.Rename ` .
786+ - ` admin.go ` — gRPC server for the new ` proto.EncryptionAdmin `
787+ service. v1 surface: `GetCapability(Empty) → CapabilityReport
788+ { encryption_capable bool, build_sha string,
789+ sidecar_present bool }`. Used by the §7.1 cutover commands
790+ to fan-out a fresh capability poll across every voting
791+ member of every Raft group; reuses the existing admin-listener
792+ TLS configuration. The RPC is read-only and side-effect-free,
793+ so it can be served unconditionally on every node regardless
794+ of leadership state.
786795
787796### 6.2 Hooks into the storage layer
788797
@@ -1007,31 +1016,43 @@ flag becomes a *capability* assertion, not a *behaviour* trigger.
10071016 envelope is gated on the cluster flag below) and proposes
10081017 cleartext Raft entries (raft envelope is gated on a separate
10091018 flag in Phase 2).
1010- 2 . Each upgraded node advertises ` encryption_capable = true ` in
1011- the ** peer-metadata extension** that today already piggy-backs
1012- on the etcd-raft heartbeat path
1013- (` internal/raftengine/etcd/peer_metadata.go ` ). A new
1014- ` encryption_capable bool ` field is added to the existing
1015- ` PeerMetadata ` proto; old binaries omit the field, which
1016- protobuf decodes as the zero value (` false ` ), so unscanned
1017- non-upgraded peers default to ` not capable ` without any
1018- special handling. The flag is purely informational at this
1019- point.
1019+ 2 . Each upgraded node advertises ` encryption_capable = true `
1020+ via an ** out-of-band gRPC** rather than by piggybacking on
1021+ the raft heartbeat path. (An earlier draft of this doc
1022+ referenced ` internal/raftengine/etcd/peer_metadata.go ` as
1023+ the extension point; that file is the on-disk peers-file
1024+ persistence layer for ` etcd-raft-peers.bin ` , not a
1025+ heartbeat extension. The reference has been retracted.) The
1026+ capability advert lives on a new
1027+ ` EncryptionAdmin.GetCapability ` unary RPC served by every
1028+ node (` proto.EncryptionAdmin ` , see §6.1). The RPC returns a
1029+ `CapabilityReport { encryption_capable bool, build_sha
1030+ string, sidecar_present bool }` so the operator can
1031+ distinguish "binary supports encryption but flag not yet
1032+ set" from "binary is too old to support encryption at all".
102010333 . The Phase-1 and Phase-2 cutover commands
10211034 (` enable-storage-envelope ` , ` enable-raft-envelope ` ) ** do not**
1022- read cached ` encryption_capable ` state from the local
1023- keystore. Instead they force a fresh poll: the leader sends a
1024- heartbeat to every voting member of every Raft group and
1025- blocks until it has received a heartbeat response carrying
1026- ` encryption_capable = true ` from every member, all within one
1027- election timeout (configurable; default the existing
1035+ read cached capability state. Instead they force a fresh
1036+ poll: the leader of the default Raft group fans out a
1037+ ` GetCapability ` call to every voting member of every Raft
1038+ group in the route catalog and blocks until it has received
1039+ ` encryption_capable = true ` from every member, all within
1040+ one election timeout (configurable; default the existing
10281041 ` RaftElectionTimeout ` ). If any member is unreachable or
10291042 reports ` false ` within the window, the command returns
10301043 ` ErrCapabilityCheckFailed ` with the offending node IDs and
10311044 the operator must re-run after fixing them. Stale cached
10321045 capability state therefore cannot trigger a premature
10331046 cutover.
10341047
1048+ The fan-out reuses the existing admin connection pool
1049+ (already TLS-authenticated for ` --adminTLSCertFile ` setups)
1050+ so no new transport machinery is required. Choosing the
1051+ out-of-band RPC over a heartbeat-context extension keeps
1052+ the cutover decision a control-plane operation rather than
1053+ a steady-state hot-path payload, and avoids changing the
1054+ etcd-raft message wire format.
1055+
10351056** Phase 1 — Storage envelope cluster-flag flip.**
10361057
103710584 . Operator runs ` elastickv-admin encryption enable-storage-envelope ` .
@@ -1062,17 +1083,27 @@ flag becomes a *capability* assertion, not a *behaviour* trigger.
10621083 ` raft_envelope_active = true ` . This entry itself uses the
10631084 legacy first-byte tag space so it remains decodable by any
10641085 replica that has applied entries up to it.
1065- 8 . From the apply index of that flag entry onward, every leader
1066- wraps new proposal ` Data []byte ` with the raft DEK (§4.2).
1067- ** There is no in-band format tag in the proposal payload.**
1068- Replicas dispatch on the ** Raft log index** of the entry
1069- relative to the persisted ` raft_envelope_cutover_index ` (the
1070- apply index of the flag entry, recorded in the local sidecar
1071- on apply). Concretely, the FSM apply path becomes:
1086+ 8 . From the ** next** entry after the flag entry's apply index
1087+ onward, every leader wraps new proposal ` Data []byte ` with
1088+ the raft DEK (§4.2). The flag entry itself is ** not**
1089+ wrapped — step 7 wrote it in legacy first-byte framing so
1090+ every replica (including a defensive non-upgraded one) can
1091+ still apply it. ** There is no in-band format tag in the
1092+ proposal payload.** Replicas dispatch on the ** Raft log
1093+ index** of the entry relative to the persisted
1094+ ` raft_envelope_cutover_index ` (the apply index of the flag
1095+ entry, recorded in the local sidecar on apply). Concretely,
1096+ the FSM apply path becomes:
10721097
10731098 ``` text
10741099 Apply(index, data):
1075- if index >= raft_envelope_cutover_index:
1100+ // Strict greater-than: the flag entry at index ==
1101+ // raft_envelope_cutover_index is itself in legacy framing.
1102+ // Unwrapping it as a raft envelope would fail GCM
1103+ // verification and either error the apply or, on
1104+ // best-effort handlers, silently drop the cutover update
1105+ // and leave the local sidecar in Phase 1 forever.
1106+ if index > raft_envelope_cutover_index:
10761107 data = raftDEK.Unwrap(data) // strip §4.2 envelope
10771108 // existing dispatch on data[0], unchanged from today:
10781109 switch data[0] {
@@ -1118,6 +1149,40 @@ only encryption-state-`0b01` MVCC versions plus a Phase-2-flagged
11181149header so the receiver knows to expect raft envelopes from there
11191150on.
11201151
1152+ #### Post-cutover membership gate
1153+
1154+ The Phase 0/1/2 capability check guards the ** initial** rollout,
1155+ but operators routinely add or replace voters after the cluster is
1156+ in production. A new voter brought up on an old binary without
1157+ ` --encryption-enabled ` would, post-cutover, hit two distinct
1158+ silent failure modes:
1159+
1160+ - * At the storage layer:* MVCC versions arriving via Raft
1161+ replication or snapshot ingest carry ` encryption_state = 0b01 `
1162+ in ` data[0] ` . The old binary reads ` data[0] != 0 ` as a
1163+ tombstone and silently drops every live value from reads.
1164+ - * At the raft layer:* every Phase-2 entry is a raft envelope.
1165+ Apply on the new voter fails immediately and the voter never
1166+ catches up.
1167+
1168+ Both modes happen with no operator-visible error until clients
1169+ start seeing missing data on reads routed to the new node.
1170+
1171+ To prevent this, the control plane (the leader of the default
1172+ Raft group) ** rejects ` AddVoter ` / ` ConfChange ` / learner
1173+ promotion** for any peer whose ` EncryptionAdmin.GetCapability `
1174+ RPC does not return ` encryption_capable = true ` , once
1175+ ` storage_envelope_active ` or ` raft_envelope_active ` has been
1176+ applied locally. The ` Distribution ` admin RPC layer surfaces this
1177+ as ` ErrPeerNotEncryptionCapable ` so the operator sees a clear
1178+ refusal at the moment they issue the membership change, not on
1179+ the first failed read.
1180+
1181+ The same gate applies to learners promoted to voters: a learner
1182+ that joined before cutover but never restarted with the
1183+ encryption flag must be restarted (or ` EncryptionAdmin.GetCapability `
1184+ must report capable) before promotion succeeds.
1185+
11211186### 7.2 Why we will not support encrypted → cleartext
11221187
11231188Toggling encryption off would require rewriting every value while
0 commit comments