Skip to content

Commit 008eb81

Browse files
authored
feat(encryption): foundation package (Stage 0) (#719)
## Summary Stage 0 of the data-at-rest encryption rollout from PR #707 (`docs/design/2026_04_29_proposed_data_at_rest_encryption.md`). This PR is library-only — no integration with existing storage / raft / FSM layers. Those land in Stages 1–9. New package `internal/encryption/`: - `cipher.go` — AES-256-GCM `Encrypt` / `Decrypt` primitive over a `Keystore`. Caller-supplied AAD bytes (no AAD composition baked in — storage and raft layers compose their own AAD per §4.1 / §4.2 in later stages). `Decrypt` wraps GCM tag mismatch as `ErrIntegrity` with the original `Open` error attached as a secondary cause, so callers match via `errors.Is` and never silently zero or retry. - `envelope.go` — §4.1 wire format encoder/decoder. Layout: `[version 1B][flag 1B][key_id 4B BE][nonce 12B][ciphertext+tag]`. `EnvelopeOverhead = 34` bytes. `HeaderAADBytes` helper exposed for storage callers that compose `AAD = HeaderAADBytes ‖ pebble_key`. `ReservedKeyID = 0` sentinel per §5.1. - `keystore.go` — copy-on-write `atomic.Pointer[map[uint32][]byte]` DEK store. `Get` is a single atomic load (no mutex on the hot path per §10 self-review lens 2). `Set` rejects `ReservedKeyID` and non-32-byte DEKs; copies input. - `errors.go` — typed errors: `ErrUnknownKeyID`, `ErrReservedKeyID`, `ErrBadNonceSize`, `ErrBadKeySize`, `ErrIntegrity`, `ErrEnvelopeShort`, `ErrEnvelopeVersion`. - `kek/kek.go` — `Wrapper` interface (`Wrap` / `Unwrap` / `Name`). KMS-backed providers (AWS KMS, GCP KMS, Vault) come in Stage 9. - `kek/file.go` — `FileWrapper`: AES-256-GCM under a 32-byte KEK read from a file. Output `[12B nonce][32B ct][16B tag] = 60 bytes`. Validates lengths instead of padding/truncating. Tests: - `cipher_test.go` — round-trip across plaintext/AAD shapes; tag/AAD/ciphertext/nonce tamper rejected with `ErrIntegrity`; typed error checks for reserved/bad-nonce/unknown-key cases; distinct nonces produce distinct ciphertexts. - `cipher_prop_test.go` — `pgregory.net/rapid` property tests for arbitrary plaintext/AAD round-trip and single-bit AAD-flip rejection (the §4.1 cut-and-paste defence property). - `envelope_test.go` — encode/decode round-trip; under-length input rejected; unknown version bytes rejected; decode does not alias input. - `keystore_test.go` — Set/Get/Delete/IDs/Len; concurrent reader/writer stress under `-race`. - `kek/file_test.go` — round-trip; distinct Wrap nonces; bad-length rejection; tag/nonce tamper rejected; missing file. ## Self-review (per CLAUDE.md 5 lenses) 1. **Data loss** — Pure library, no persistence yet. `ErrIntegrity` always propagated. Round-trip property test guards envelope-format bugs. 2. **Concurrency** — `atomic.Pointer[map]` for keystore; concurrent stress test under `-race`. 3. **Performance** — AES-NI via `crypto/aes`. No allocations beyond per-call ciphertext buffer (sync.Pool optimisation deferred to integration stages). 4. **Data consistency** — No MVCC/OCC interaction yet. 5. **Test coverage** — 23 unit tests + 2 property tests, `-race` clean, 0 lint issues against project `.golangci.yaml`. ## Test plan - [x] `go test ./internal/encryption/...` (~14s) - [x] `go test -race ./internal/encryption/...` (~65s) - [x] `golangci-lint run ./internal/encryption/...` (0 issues) - [ ] Reviewer: confirm `Cipher` API does not bake storage/raft AAD assumptions in (§4.1 vs §4.2 layouts must remain orthogonal). - [ ] Reviewer: confirm `ReservedKeyID = 0` is rejected at all entry points (`Cipher.Encrypt`, `Cipher.Decrypt`, `Keystore.Set`). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## New Features * Added AES-256-GCM encryption support with envelope-based encryption format * Implemented key management system for secure encryption key handling * Added file-based key encryption key (KEK) support for key wrapping and unwrapping * Introduced comprehensive error handling for encryption operations including integrity verification ## Tests * Added extensive unit, benchmark, and property-based tests for encryption functionality <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2 parents cd5bcf6 + 5aa2e93 commit 008eb81

16 files changed

Lines changed: 2251 additions & 0 deletions

internal/encryption/cipher.go

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
package encryption
2+
3+
import (
4+
"crypto/cipher"
5+
6+
"github.com/cockroachdb/errors"
7+
)
8+
9+
// Cipher is the AES-256-GCM primitive over a Keystore.
10+
//
11+
// Cipher does NOT compose AAD — callers in store/ (§4.1 AAD) and
12+
// internal/raftengine/etcd/ (§4.2 AAD) supply the full AAD bytes. This
13+
// keeps the cipher narrow and lets each layer choose the right AAD
14+
// without baking storage/raft assumptions into the foundation package.
15+
//
16+
// AES key expansion and GCM initialization happen once per DEK at
17+
// Keystore.Set time; the hot path only needs an atomic.Pointer load
18+
// and a Seal/Open call.
19+
//
20+
// The zero value is NOT safe to use: Encrypt/Decrypt return
21+
// ErrNilKeystore for a zero-value or nil Cipher rather than nil-deref
22+
// panicking. Always construct via NewCipher.
23+
type Cipher struct {
24+
keystore *Keystore
25+
}
26+
27+
// NewCipher returns a Cipher backed by ks.
28+
//
29+
// Returns ErrNilKeystore if ks is nil. Catching this at construction
30+
// time turns a wiring mistake into a typed error during process
31+
// startup or DEK rotation, rather than a nil-deref panic on the first
32+
// Encrypt/Decrypt — important for the dynamic dependency-wiring paths
33+
// where the encryption stack may be re-initialised after a sidecar
34+
// resync (§5.5) or rotation (§5.2).
35+
func NewCipher(ks *Keystore) (*Cipher, error) {
36+
if ks == nil {
37+
return nil, errors.WithStack(ErrNilKeystore)
38+
}
39+
return &Cipher{keystore: ks}, nil
40+
}
41+
42+
// Encrypt produces (ciphertext ‖ tag) for plaintext under the DEK
43+
// identified by keyID and the supplied nonce. aad is treated verbatim.
44+
//
45+
// Constraints:
46+
// - keyID must not be ReservedKeyID; otherwise ErrReservedKeyID.
47+
// - nonce must be NonceSize bytes; otherwise ErrBadNonceSize.
48+
// - keyID must be present in the Keystore; otherwise ErrUnknownKeyID.
49+
//
50+
// CRITICAL: callers MUST NOT reuse the same (keyID, nonce) pair with
51+
// any two distinct plaintexts. Nonce reuse under AES-GCM is
52+
// catastrophic: it leaks the XOR of the two plaintexts and enables
53+
// authentication-key recovery. The §4.1 storage-layer integration
54+
// uses the nonce construction (node_id ‖ local_epoch ‖ write_count)
55+
// to guarantee uniqueness by construction; do not substitute a
56+
// different nonce scheme in that layer without a corresponding
57+
// uniqueness proof. (For tests / benchmarks, fresh crypto/rand
58+
// nonces are perfectly safe.)
59+
//
60+
// The returned slice has length len(plaintext) + TagSize. It is
61+
// freshly allocated; the caller may retain it indefinitely.
62+
func (c *Cipher) Encrypt(plaintext, aad []byte, keyID uint32, nonce []byte) ([]byte, error) {
63+
aead, err := c.aeadFor(keyID, nonce)
64+
if err != nil {
65+
return nil, err
66+
}
67+
return aead.Seal(nil, nonce, plaintext, aad), nil
68+
}
69+
70+
// Decrypt verifies and decrypts (ciphertext ‖ tag) using the DEK
71+
// identified by keyID, the supplied nonce, and the same aad bytes that
72+
// were passed to Encrypt.
73+
//
74+
// On GCM tag mismatch, Decrypt returns an error wrapping ErrIntegrity.
75+
// Per §4.1, callers MUST treat this as a typed read error and never
76+
// silently zero or retry. The original Open error is attached as a
77+
// secondary cause for diagnostic logging.
78+
func (c *Cipher) Decrypt(ciphertextAndTag, aad []byte, keyID uint32, nonce []byte) ([]byte, error) {
79+
aead, err := c.aeadFor(keyID, nonce)
80+
if err != nil {
81+
return nil, err
82+
}
83+
plaintext, err := aead.Open(nil, nonce, ciphertextAndTag, aad)
84+
if err != nil {
85+
// Wrap ErrIntegrity (the typed error callers match via errors.Is)
86+
// and attach the original GCM Open error as a secondary cause for
87+
// diagnostic logging. Per §4.1, callers MUST treat this as a
88+
// typed read error and never silently zero or retry.
89+
return nil, errors.Wrap(
90+
errors.WithSecondaryError(ErrIntegrity, err),
91+
"encryption: aead.Open",
92+
)
93+
}
94+
return plaintext, nil
95+
}
96+
97+
// aeadFor validates keyID and nonce length, then returns the
98+
// pre-initialized AEAD from the keystore. The hot path here is a single
99+
// atomic.Pointer load + a map lookup; AES key expansion happened once
100+
// at Keystore.Set time.
101+
//
102+
// A nil receiver or zero-value Cipher (i.e. c.keystore == nil) is
103+
// rejected with ErrNilKeystore so a caller that bypasses NewCipher and
104+
// uses var c encryption.Cipher gets a typed error instead of a
105+
// nil-deref panic on the first Encrypt/Decrypt.
106+
func (c *Cipher) aeadFor(keyID uint32, nonce []byte) (cipher.AEAD, error) {
107+
if c == nil || c.keystore == nil {
108+
return nil, errors.WithStack(ErrNilKeystore)
109+
}
110+
if keyID == ReservedKeyID {
111+
return nil, errors.WithStack(ErrReservedKeyID)
112+
}
113+
if len(nonce) != NonceSize {
114+
return nil, errors.Wrapf(ErrBadNonceSize, "got %d bytes, want %d", len(nonce), NonceSize)
115+
}
116+
aead, ok := c.keystore.AEAD(keyID)
117+
if !ok {
118+
return nil, errors.Wrapf(ErrUnknownKeyID, "key_id=%d", keyID)
119+
}
120+
return aead, nil
121+
}
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
package encryption_test
2+
3+
import (
4+
"crypto/rand"
5+
"strconv"
6+
"testing"
7+
8+
"github.com/bootjp/elastickv/internal/encryption"
9+
)
10+
11+
// benchPlaintextSizes covers the value-size range that storage / raft
12+
// envelopes hit on the hot path: small (Redis counters, SET keys),
13+
// medium (typical KV payloads, JSON), large (DynamoDB items, S3 small
14+
// objects). 64 KiB caps the upper end so the benchmark stays under a
15+
// few seconds; bigger plaintexts are AES-NI-bound and the per-byte
16+
// throughput is already extracted at 64 KiB.
17+
var benchPlaintextSizes = []int{64, 1024, 16 * 1024, 64 * 1024}
18+
19+
func setupBench(b *testing.B) (*encryption.Cipher, uint32, []byte) {
20+
b.Helper()
21+
ks := encryption.NewKeystore()
22+
dek := make([]byte, encryption.KeySize)
23+
if _, err := rand.Read(dek); err != nil {
24+
b.Fatalf("rand.Read dek: %v", err)
25+
}
26+
if err := ks.Set(testKeyID, dek); err != nil {
27+
b.Fatalf("Set: %v", err)
28+
}
29+
c, err := encryption.NewCipher(ks)
30+
if err != nil {
31+
b.Fatalf("NewCipher: %v", err)
32+
}
33+
nonce := make([]byte, encryption.NonceSize)
34+
if _, err := rand.Read(nonce); err != nil {
35+
b.Fatalf("rand.Read nonce: %v", err)
36+
}
37+
return c, testKeyID, nonce
38+
}
39+
40+
// BenchmarkCipher_Encrypt verifies the post-keystore-redesign hot path:
41+
// Set installed the AEAD once, so each Encrypt call should be a single
42+
// atomic.Pointer load + AEAD.Seal.
43+
func BenchmarkCipher_Encrypt(b *testing.B) {
44+
c, keyID, nonce := setupBench(b)
45+
aad := []byte("storage-aad-context")
46+
47+
for _, size := range benchPlaintextSizes {
48+
plaintext := make([]byte, size)
49+
if _, err := rand.Read(plaintext); err != nil {
50+
b.Fatalf("rand.Read plaintext: %v", err)
51+
}
52+
b.Run(name(size), func(b *testing.B) {
53+
b.SetBytes(int64(size))
54+
b.ReportAllocs()
55+
b.ResetTimer()
56+
for i := 0; i < b.N; i++ {
57+
if _, err := c.Encrypt(plaintext, aad, keyID, nonce); err != nil {
58+
b.Fatal(err)
59+
}
60+
}
61+
})
62+
}
63+
}
64+
65+
// BenchmarkCipher_Decrypt mirrors BenchmarkCipher_Encrypt for the read
66+
// side. Each iteration runs AEAD.Open against the same ciphertext so we
67+
// measure the steady-state cost rather than seeding overhead.
68+
func BenchmarkCipher_Decrypt(b *testing.B) {
69+
c, keyID, nonce := setupBench(b)
70+
aad := []byte("storage-aad-context")
71+
72+
for _, size := range benchPlaintextSizes {
73+
plaintext := make([]byte, size)
74+
if _, err := rand.Read(plaintext); err != nil {
75+
b.Fatalf("rand.Read plaintext: %v", err)
76+
}
77+
ct, err := c.Encrypt(plaintext, aad, keyID, nonce)
78+
if err != nil {
79+
b.Fatalf("Encrypt: %v", err)
80+
}
81+
b.Run(name(size), func(b *testing.B) {
82+
b.SetBytes(int64(size))
83+
b.ReportAllocs()
84+
b.ResetTimer()
85+
for i := 0; i < b.N; i++ {
86+
if _, err := c.Decrypt(ct, aad, keyID, nonce); err != nil {
87+
b.Fatal(err)
88+
}
89+
}
90+
})
91+
}
92+
}
93+
94+
// BenchmarkKeystore_AEAD exercises the hot-path lookup in isolation.
95+
// AEAD() should be a single atomic.Pointer load + map lookup; no
96+
// allocations, no per-call AES key expansion.
97+
func BenchmarkKeystore_AEAD(b *testing.B) {
98+
ks := encryption.NewKeystore()
99+
dek := make([]byte, encryption.KeySize)
100+
if _, err := rand.Read(dek); err != nil {
101+
b.Fatalf("rand.Read: %v", err)
102+
}
103+
if err := ks.Set(testKeyID, dek); err != nil {
104+
b.Fatalf("Set: %v", err)
105+
}
106+
b.ReportAllocs()
107+
b.ResetTimer()
108+
for i := 0; i < b.N; i++ {
109+
if _, ok := ks.AEAD(testKeyID); !ok {
110+
b.Fatal("AEAD lookup missed")
111+
}
112+
}
113+
}
114+
115+
// name returns a sub-benchmark label scaling with size for readable
116+
// benchstat output ("64B", "1KiB", "16KiB", "64KiB").
117+
func name(size int) string {
118+
switch {
119+
case size < 1024:
120+
return strconv.Itoa(size) + "B"
121+
case size < 1024*1024:
122+
return strconv.Itoa(size/1024) + "KiB"
123+
default:
124+
return strconv.Itoa(size/(1024*1024)) + "MiB"
125+
}
126+
}
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
package encryption_test
2+
3+
import (
4+
"bytes"
5+
"crypto/rand"
6+
"testing"
7+
8+
"github.com/bootjp/elastickv/internal/encryption"
9+
"github.com/cockroachdb/errors"
10+
"pgregory.net/rapid"
11+
)
12+
13+
// TestCipher_RoundTripProperty checks that for arbitrary plaintext, AAD,
14+
// nonce, and key_id, Encrypt followed by Decrypt with the same inputs
15+
// recovers the exact plaintext. CLAUDE.md flags this kind of test as the
16+
// canonical safety net for envelope-format / compress-then-encrypt bugs.
17+
func TestCipher_RoundTripProperty(t *testing.T) {
18+
rapid.Check(t, func(t *rapid.T) {
19+
ks := encryption.NewKeystore()
20+
// Random 32-byte DEK installed at a random non-reserved key_id.
21+
dek := make([]byte, encryption.KeySize)
22+
if _, err := rand.Read(dek); err != nil {
23+
t.Fatalf("rand.Read dek: %v", err)
24+
}
25+
keyID := rapid.Uint32Range(1, 0xFFFFFFFF).Draw(t, "keyID")
26+
if err := ks.Set(keyID, dek); err != nil {
27+
t.Fatalf("ks.Set: %v", err)
28+
}
29+
30+
c, err := encryption.NewCipher(ks)
31+
if err != nil {
32+
t.Fatalf("NewCipher: %v", err)
33+
}
34+
plaintext := rapid.SliceOfN(rapid.Byte(), 0, 4096).Draw(t, "plaintext")
35+
aad := rapid.SliceOfN(rapid.Byte(), 0, 256).Draw(t, "aad")
36+
nonce := make([]byte, encryption.NonceSize)
37+
if _, err := rand.Read(nonce); err != nil {
38+
t.Fatalf("rand.Read nonce: %v", err)
39+
}
40+
41+
ct, err := c.Encrypt(plaintext, aad, keyID, nonce)
42+
if err != nil {
43+
t.Fatalf("Encrypt: %v", err)
44+
}
45+
got, err := c.Decrypt(ct, aad, keyID, nonce)
46+
if err != nil {
47+
t.Fatalf("Decrypt: %v", err)
48+
}
49+
// Both empty plaintext and empty got should compare equal; treat
50+
// nil and empty as the same here so the property holds for the
51+
// 0-length boundary.
52+
if len(got) == 0 && len(plaintext) == 0 {
53+
return
54+
}
55+
if !bytes.Equal(got, plaintext) {
56+
t.Fatalf("plaintext mismatch:\n got %x\n want %x", got, plaintext)
57+
}
58+
})
59+
}
60+
61+
// TestCipher_AADTamperProperty checks that any single-bit flip in the AAD
62+
// at decrypt time causes ErrIntegrity. This is the property that backs the
63+
// §4.1 cut-and-paste / blob-relocation defence.
64+
func TestCipher_AADTamperProperty(t *testing.T) {
65+
rapid.Check(t, func(t *rapid.T) {
66+
ks := encryption.NewKeystore()
67+
dek := make([]byte, encryption.KeySize)
68+
if _, err := rand.Read(dek); err != nil {
69+
t.Fatalf("rand.Read dek: %v", err)
70+
}
71+
keyID := uint32(1)
72+
if err := ks.Set(keyID, dek); err != nil {
73+
t.Fatalf("ks.Set: %v", err)
74+
}
75+
c, err := encryption.NewCipher(ks)
76+
if err != nil {
77+
t.Fatalf("NewCipher: %v", err)
78+
}
79+
80+
plaintext := rapid.SliceOfN(rapid.Byte(), 1, 256).Draw(t, "plaintext")
81+
// AAD must be at least 1 byte so we can tamper.
82+
aad := rapid.SliceOfN(rapid.Byte(), 1, 64).Draw(t, "aad")
83+
nonce := make([]byte, encryption.NonceSize)
84+
if _, err := rand.Read(nonce); err != nil {
85+
t.Fatalf("rand.Read nonce: %v", err)
86+
}
87+
88+
ct, err := c.Encrypt(plaintext, aad, keyID, nonce)
89+
if err != nil {
90+
t.Fatalf("Encrypt: %v", err)
91+
}
92+
93+
idx := rapid.IntRange(0, len(aad)-1).Draw(t, "idx")
94+
bit := rapid.Uint8Range(0, 7).Draw(t, "bit")
95+
bad := append([]byte(nil), aad...)
96+
bad[idx] ^= 1 << bit
97+
98+
_, err = c.Decrypt(ct, bad, keyID, nonce)
99+
if !errors.Is(err, encryption.ErrIntegrity) {
100+
t.Fatalf("expected ErrIntegrity for tampered AAD, got %v", err)
101+
}
102+
})
103+
}

0 commit comments

Comments
 (0)