backup: snapshot_reader (Phase 0a foundation for snapshot-decode binary)#792
backup: snapshot_reader (Phase 0a foundation for snapshot-decode binary)#792bootjp wants to merge 1 commit into
Conversation
Adds internal/backup/snapshot_reader.go: the .fsm file parser that
underpins the cmd/elastickv-snapshot-decode binary (design
docs/design/2026_04_29_proposed_snapshot_logical_decoder.md
lines 440-491). Mirrors the native Pebble snapshot format produced
by store/lsm_store.go::pebbleSnapshotMagic + restoreBatchLoopInto.
File shape consumed:
[8 bytes] magic "EKVPBBL1"
[8 bytes] lastCommitTS (LittleEndian uint64)
repeated:
[8 bytes] keyLen (LittleEndian uint64)
[keyLen] encoded key = <userKey><invTS(8 BE)>
[8 bytes] valLen (LittleEndian uint64)
[valLen] encoded value = <flags(1)><expireAt(8 LE)><body>
Per-entry decoding strips:
- The 8-byte inverted-TS suffix from the key (recovers commit_ts
via XOR-inversion across the math.MaxUint64 boundary).
- The 9-byte value header from the value (surfaces tombstone flag,
encryption_state, expireAt).
API surface (ReadSnapshot + SnapshotHeader + SnapshotEntry +
sentinel errors). Callback-based so callers can stream
multi-GB snapshots without buffering the whole file in memory; the
emitted SnapshotEntry's byte slices alias scratch buffers and the
caller must bytes.Clone for retention.
Fail-closed contract (matches design 7.1):
- ErrSnapshotBadMagic - first 8 bytes not "EKVPBBL1" (operator
pointed at the wrong file)
- ErrSnapshotTruncated - EOF mid-entry (a clean inter-entry EOF
is the normal terminator and is NOT
an error)
- ErrSnapshotShortKey - encoded key < 8 bytes (no room for TS)
- ErrSnapshotShortValue - encoded value < 9 bytes (no room for
value header)
- ErrSnapshotEncryptedReserved - reserved encryption_state bits
(0b10, 0b11) or bits 3-7 set; same
fail-closed trip-wire as
store.ErrEncryptedValueReservedState
- ErrSnapshotEncryptedEntry - encState=0b01 (encrypted).
Phase 0a does not link the decryption
keyring; Stage 8 of the encryption
rollout adds a keyring-aware variant.
Self-review:
1. Data loss - tombstone flag surfaced verbatim so callers can
choose to suppress (default Phase 0a backup behavior) or
include (a future diagnostic dump might want both). EOF
distinction between clean terminator and truncation prevents
silently dropping a partial entry.
2. Concurrency - reader is single-pass over an io.Reader; no
shared state, no goroutines.
3. Performance - per-entry slices alias a fixed 64 KiB key buffer
and a growable value buffer. For a 10M-entry snapshot this
keeps allocs O(1) outside the value-grow boundary. bufio
wrapper amortises read syscalls.
4. Consistency - encryption_state bits are decoded into the
identical 4-way switch (cleartext / encrypted / 0b10 / 0b11)
that store/lsm_store.go::decodeValue uses, so a future
live-side enum extension fails closed here too.
5. Coverage - 17 table-driven tests in snapshot_reader_test.go
covering: header-only, single-entry round-trip, tombstone
flag, multi-entry order preservation, bad magic, truncated
header, truncated entry, short-key, short-value, encrypted
entry, all four reserved-encState cases, callback-error
propagation, MVCC TS XOR-inversion across uint64 boundary,
empty-value body.
Caller audit (per /loop standing instruction): this PR adds a
NEW public API surface (ReadSnapshot, SnapshotEntry,
SnapshotHeader, PebbleSnapshotMagic, the Err* sentinels). No
prior callers exist. Verified via
'grep -rn "ReadSnapshot|SnapshotEntry|SnapshotHeader" --include=*.go'
- all matches are inside internal/backup/snapshot_reader{,test}.go.
The follow-up PR (cmd/elastickv-snapshot-decode driver +
dispatcher) will be this API's first consumer.
|
@claude review |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @bootjp's task in 2m 53s —— View job Code Review —
|
| Constant | store/lsm_store.go |
snapshot_reader.go |
|---|---|---|
| timestamp suffix size | timestampSize = 8 |
snapshotTSSize = 8 ✓ |
| value header size | valueHeaderSize = 9 |
snapshotValueHeaderSize = 9 ✓ |
| tombstone mask | 0b0000_0001 |
0b0000_0001 ✓ |
| encState mask/shift | 0b0000_0110 / 1 |
0b0000_0110 / 1 ✓ |
| reserved high-bit mask | 0b1111_1000 |
0b1111_1000 ✓ |
| magic bytes | {'E','K','V','P','B','B','L','1'} |
identical ✓ |
Key encoding (userKey || BigEndian(^ts)) and value encoding (flags(1) || expireAt(8 LE) || body) match fillEncodedKey / fillEncodedValue exactly.
Summary
| Severity | Issue |
|---|---|
| Bug | Discarded eof bool → wrong sentinel (ErrSnapshotShortValue vs ErrSnapshotTruncated) + dead io.EOF branch + missing test |
| Robustness | Unbounded allocation in readExactGrow on corrupt length field |
| Minor | Typo snapshotEncStateCleartx → snapshotEncStateCleartext |
| Minor | Double WithStack on key-read error path |
| Design | SnapshotHeader invisible for empty snapshots |
The format compatibility and the fail-closed encryption/reserved-state logic are solid. The bug above is the only one that changes which sentinel callers see, and it only manifests on a specific truncation pattern not covered by the existing tests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42fb7ba3db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| return false, cockroachdberr.WithStack(err) | ||
| } | ||
| vLen, _, err := readEntryLen(r) |
There was a problem hiding this comment.
Return truncation when value length is missing
When a file ends immediately after the key bytes, readEntryLen returns (0, true, nil), but this call discards the eof flag and proceeds as if the entry had a zero-length value. That makes this mid-entry truncation surface as ErrSnapshotShortValue rather than the advertised ErrSnapshotTruncated, so callers cannot reliably distinguish a cut-off snapshot from a malformed encoded value. Please check the eof result here and return ErrSnapshotTruncated for this case.
Useful? React with 👍 / 👎.
| if uint64(cap(dst)) < n { | ||
| dst = make([]byte, n) |
There was a problem hiding this comment.
Bound snapshot length prefixes before allocating
For a corrupt or adversarial snapshot with a huge key/value length prefix, this path allocates make([]byte, n) directly from the untrusted uint64 length. That can panic with len out of range or OOM the operator decoder instead of failing closed; the live restore path guards the same fields with maxPebbleEncodedKeySize / maxSnapshotValueSize before allocation. Please validate kLen and vLen against equivalent bounds before calling this helper.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a native Pebble snapshot reader and associated tests to decode entries while stripping MVCC metadata. The reviewer identified several critical issues: a lack of bounds checking on key and value lengths that could lead to resource exhaustion or panics, improper handling of truncated snapshots when reading value lengths, and an inefficient use of reflection in the header parsing logic.
| valBuf *[]byte, | ||
| fn func(SnapshotHeader, SnapshotEntry) error, | ||
| ) (bool, error) { | ||
| kLen, eof, err := readEntryLen(r) |
There was a problem hiding this comment.
The reader lacks bounds checking for kLen and vLen. Since these lengths are read directly from the input stream as uint64, a malformed or malicious snapshot file could specify extremely large lengths, leading to massive memory allocations and potential Out-Of-Memory (OOM) crashes in readExact and readExactGrow. Additionally, on 32-bit systems, converting a large uint64 to a slice index or length (which are int) will cause a panic.
It is recommended to enforce maximum size limits for keys and values, similar to the limits used in store/lsm_store.go (e.g., maxPebbleEncodedKeySize and maxSnapshotValueSize).
References
- When deserializing data with a size header, pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation and the entire data must be materialized in memory.
| vLen, _, err := readEntryLen(r) | ||
| if err != nil { | ||
| // A clean EOF here means the snapshot truncated between | ||
| // the key bytes and the value-length field — not the | ||
| // same as a clean inter-entry EOF. | ||
| if cockroachdberr.Is(err, io.EOF) { | ||
| return false, cockroachdberr.WithStack(ErrSnapshotTruncated) | ||
| } | ||
| return false, err | ||
| } |
There was a problem hiding this comment.
The readOneEntry function ignores the eof return value from readEntryLen when reading the value length (vLen). If the snapshot is truncated exactly before the vLen field, readEntryLen returns (0, true, nil). In this case, err is nil, and the code proceeds to call readExactGrow with vLen=0, which succeeds. Subsequently, decodeSnapshotEntry will fail with ErrSnapshotShortValue because the value header is missing. This should instead return ErrSnapshotTruncated to accurately reflect the state of the stream.
Additionally, the check cockroachdberr.Is(err, io.EOF) is redundant because readEntryLen already handles io.EOF and io.ErrUnexpectedEOF, returning either (0, true, nil) or ErrSnapshotTruncated.
vLen, vEof, err := readEntryLen(r)
if err != nil {
return false, err
}
if vEof {
return false, cockroachdberr.WithStack(ErrSnapshotTruncated)
}| var ts uint64 | ||
| if err := binary.Read(r, binary.LittleEndian, &ts); err != nil { | ||
| return SnapshotHeader{}, cockroachdberr.WithStack(err) | ||
| } |
There was a problem hiding this comment.
Using binary.Read with a pointer to a scalar value is inefficient in Go because it uses reflection. For performance-sensitive code like a snapshot reader, it is better to read into a small byte array and use binary.LittleEndian.Uint64 directly, as done in readEntryLen.
| var ts uint64 | |
| if err := binary.Read(r, binary.LittleEndian, &ts); err != nil { | |
| return SnapshotHeader{}, cockroachdberr.WithStack(err) | |
| } | |
| var tsBuf [8]byte | |
| if _, err := io.ReadFull(r, tsBuf[:]); err != nil { | |
| return SnapshotHeader{}, cockroachdberr.WithStack(err) | |
| } | |
| ts := binary.LittleEndian.Uint64(tsBuf[:]) |
Summary
Adds
internal/backup/snapshot_reader.go: the.fsmfile parserthat underpins the
cmd/elastickv-snapshot-decodebinary (designdocs/design/2026_04_29_proposed_snapshot_logical_decoder.mdlines440-491). This is the foundational step toward Phase 0a's
operator-facing decoder; the dispatcher + cmd driver that consume
this reader come in a follow-up PR.
Mirrors the native Pebble snapshot format produced by
store/lsm_store.go::pebbleSnapshotMagic + restoreBatchLoopInto:Per-entry decoding strips:
commit_tsvia XORencryption_state,expireAtAPI surface:
ReadSnapshot(io.Reader, callback)+SnapshotHeader+SnapshotEntry+ sentinel errors. Callback-based so callers canstream multi-GB snapshots without buffering; emitted byte slices
alias scratch buffers, so callers must
bytes.Clonefor retention.Fail-closed contract (matches design §7.1)
ErrSnapshotBadMagic— first 8 bytes not"EKVPBBL1"(operatorpointed the binary at an MVCC streaming snapshot, a tar, etc.)
ErrSnapshotTruncated— EOF mid-entry. A clean inter-entry EOFis the normal terminator and is NOT an error.
ErrSnapshotShortKey— encoded key < 8 bytesErrSnapshotShortValue— encoded value < 9 bytesErrSnapshotEncryptedReserved— reservedencryption_statebits (
0b10,0b11) or bits 3-7 set — fail-closed trip-wireparallel to
store.ErrEncryptedValueReservedStateErrSnapshotEncryptedEntry—encState=0b01. Phase 0a doesnot link the decryption keyring; Stage 8 of the encryption
rollout will add a keyring-aware variant.
Self-review (5 lenses)
choose to suppress for backup output or include for
diagnostic dumps). EOF distinguishes clean terminator from
truncation.
io.Reader; noshared state, no goroutines.
buffer + a growable value buffer. For a 10M-entry snapshot
this keeps allocs O(1) outside the value-grow path.
bufioamortises read syscalls.
encryption_stateis decoded into theidentical 4-way switch (cleartext / encrypted /
0b10/0b11) thatstore/lsm_store.go::decodeValueuses, so afuture live-side enum extension fails closed here too.
snapshot_reader_test.go:encStatecasesuint64boundaryCaller audit (per
/loopstanding instruction)This PR adds a NEW public API surface (
ReadSnapshot,SnapshotEntry,SnapshotHeader,PebbleSnapshotMagic, theErr*sentinels). No prior callers exist. Verified viagrep -rn "ReadSnapshot|SnapshotEntry|SnapshotHeader"across the tree — all matches are inside the two new files. The
follow-up PR (cmd driver + dispatcher) is this API's first consumer.
Test plan
go test -race ./internal/backup/→ okgolangci-lint run ./internal/backup/...→ 0 issuesgo build ./...→ okgo vet ./internal/backup/...→ ok