Commit 9cac82a
authored
docs/design: propose Phase 0b snapshot logical encoder (#823)
## Summary
Doc-first PR for **Phase 0b** of the snapshot logical backup work
(CLAUDE.md requires a `*_proposed_*` design doc to land before any
wire-format implementation). Two commits:
1. **Promote** `snapshot_logical_decoder` `proposed`→`partial` (Phase 0a
fully shipped: PRs #790/#791/#792/#806/#810) in a dedicated commit per
the `docs/design/README` lifecycle convention.
2. **Propose** `2026_05_25_proposed_snapshot_logical_encoder.md` —
`cmd/elastickv-snapshot-encode`, the inverse of the Phase 0a decoder.
## Why a separate design doc
The encoder is not a mechanical mirror of the decoder. Three decisions
arise only on the encode side, each a wire-format decision the parent
doc left at sketch level:
- **Internal-index reconstruction** (the load-bearing decision): the
decoder *drops* every re-derivable internal index (Redis TTL scan index,
DynamoDB GSI rows, SQS vis/dedup/group/by-age side records, per-scope
generation counters). A *loadable* `.fsm` must contain them or the
restored node serves wrong results (TTL'd keys never expire, GSI queries
return nothing). The encoder must rebuild the full internal keyspace,
mirroring the live adapter index builders (duplicated into
`internal/backup` behind the same offline-tool boundary +
staleness-review discipline already used for the snapshot-reader
constants).
- **MVCC re-encoding**: the directory tree carries no per-key
`commit_ts` (decode discards it), so the encoder stamps every key with
`invTS = ^last_commit_ts` from `MANIFEST.json`. Keeps every restored row
at-or-below the HLC ceiling seeded from the snapshot header.
- **No CRC32C footer**: the parent doc's format sketch showed a trailing
CRC; that framing is the *MVCC streaming-restore* path, not the native
EKVPBBL1 snapshot the decoder reads / the encoder must emit.
Authoritative target format pinned down against
`store/snapshot_pebble.go` `WriteTo` +
`internal/backup/snapshot_reader.go` `ReadSnapshot`.
## Contents
- Authoritative `.fsm` target format (sorted entries, size caps,
cleartext-only).
- Per-adapter reverse-encoder breakdown (Redis / DynamoDB / S3 / SQS),
route-for-route against `internal/backup/decode.go`.
- Directory-level round-trip self-test (`dir -> encode -> .fsm -> decode
-> dir'`, exact; reverse `.fsm`-byte-identical is explicitly a
non-goal).
- Version/format gate + `ENCODE_INFO.json` provenance (`cluster_id`,
key-format version).
- Two **decision gates** flagged for review during implementation: GSI
derivation and SQS side-record derivation (full reconstruction vs.
lazy-rebuild fallback).
- Per-adapter milestone plan mirroring Phase 0a.
## Risk
Docs only — no code, no behavior change.
## Self-review (5 lenses)
- **Data loss**: none — docs only. The doc itself hardens the *future*
encoder's data-loss surface (index reconstruction, fail-closed on
oversize entries, round-trip gate before finalize).
- **Concurrency/distributed**: n/a (offline tool design).
- **Performance**: doc notes the in-memory-sort memory bound and defers
an external-sort follow-up.
- **Consistency**: MVCC re-encoding section keeps restored rows at/below
the HLC ceiling; documents why per-key `commit_ts` loss is invisible to
the round-trip.
- **Test coverage**: P0/P1/P2 test plan enumerated; per-adapter
cross-check vs. live index builders required.
## Test plan
- [ ] Design review of the two decision gates (GSI / SQS side records).
- [ ] Confirm the MVCC re-encoding `last_commit_ts` stamping is
acceptable for the restore HLC-ceiling seeding path.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated snapshot decoder design documentation; Phase 0a marked
complete with Phase 0b encoder boundaries clarified
* Added snapshot encoder design specification with requirements for data
reconstruction and validation
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/823?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->2 files changed
Lines changed: 487 additions & 8 deletions
Lines changed: 26 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
7 | 15 | | |
8 | 16 | | |
9 | 17 | | |
| |||
14 | 22 | | |
15 | 23 | | |
16 | 24 | | |
17 | | - | |
18 | 25 | | |
19 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
20 | 36 | | |
21 | 37 | | |
22 | 38 | | |
| |||
458 | 474 | | |
459 | 475 | | |
460 | 476 | | |
461 | | - | |
| 477 | + | |
462 | 478 | | |
463 | 479 | | |
464 | 480 | | |
| |||
495 | 511 | | |
496 | 512 | | |
497 | 513 | | |
498 | | - | |
| 514 | + | |
499 | 515 | | |
500 | 516 | | |
501 | 517 | | |
| |||
505 | 521 | | |
506 | 522 | | |
507 | 523 | | |
508 | | - | |
509 | | - | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
510 | 528 | | |
511 | 529 | | |
512 | 530 | | |
513 | | - | |
| 531 | + | |
514 | 532 | | |
515 | 533 | | |
516 | 534 | | |
| |||
683 | 701 | | |
684 | 702 | | |
685 | 703 | | |
686 | | - | |
| 704 | + | |
687 | 705 | | |
688 | 706 | | |
689 | 707 | | |
| |||
0 commit comments