backup: KEYMAP.jsonl writer/reader and MANIFEST.json schema (Phase 0a)#712
backup: KEYMAP.jsonl writer/reader and MANIFEST.json schema (Phase 0a)#712bootjp wants to merge 7 commits intofeat/backup-phase0a-filenamefrom
Conversation
Builds on PR #711 (filename encoding). Adds two more foundation pieces of the Phase 0 logical-backup decoder. KEYMAP.jsonl (internal/backup/keymap.go) - Append-only JSONL stream of {encoded, original (b64url), kind} records. - Records exist only for entries whose original bytes are NOT recoverable from the encoded filename alone: - KindSHAFallback (segments rendered as <sha32>__<truncated>) - KindS3LeafData (S3 path collisions renamed to .elastickv-leaf-data) - KindMetaCollision (user object key ending in .elastickv-meta.json) - KeymapWriter: streaming append, json encoder configured to skip HTML escapes so user-key bytes round-trip cleanly. Refuses empty encoded or kind so producer bugs surface loudly. Count() exposed for the "omit empty file" decision. - KeymapReader: line-by-line scanner with bounded buffer (1 MiB); blank lines surface as ErrInvalidKeymapRecord rather than being silently skipped so truncated dumps are recognised. - LoadKeymap: convenience helper that materialises the file as a map (last-wins on duplicates). MANIFEST.json (internal/backup/manifest.go) - Manifest, Source, Live, Adapters, Adapter, Exclusions structs matching the schema in docs/design/2026_04_29_proposed_snapshot_logical_decoder.md. - CurrentFormatVersion = 1; ReadManifest refuses format_version > current and format_version == 0 (ErrUnsupportedFormatVersion). - Phase discriminator constants for Phase 0 ("phase0-snapshot-decode") and Phase 1 ("phase1-live-pinned"); Phase 0 manifests must not set Live, Phase 1 must not set Source -- both validated at write and read time. - DisallowUnknownFields on read so format drift surfaces loudly. - Pretty-printed output (2-space indent, no HTML escapes) since MANIFEST.json is operator-facing. - NewPhase0SnapshotManifest seeds the policy fields with the documented defaults so callers can focus on populating per-dump metadata. Tests cover round-trip, sticky-error semantics, last-wins dedup, HTML-escape suppression, future-version refusal, unknown-field refusal, unknown-phase refusal, and the cross-phase Source/Live exclusion rules.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the metadata infrastructure for a backup system, implementing the MANIFEST.json and KEYMAP.jsonl file formats. The manifest provides a central record of backup versions, phases, and adapter scopes, while the keymap facilitates the reversal of encoded filenames to their original keys. Feedback highlights an opportunity to improve the manifest's representation of adapter scopes using pointers to distinguish between excluded and empty states, and suggests renaming a misleading test case in the keymap suite.
| type Adapters struct { | ||
| DynamoDB Adapter `json:"dynamodb"` | ||
| S3 Adapter `json:"s3"` | ||
| Redis Adapter `json:"redis"` | ||
| SQS Adapter `json:"sqs"` | ||
| } |
There was a problem hiding this comment.
The documentation for the Adapters struct describes a distinction between nil and empty values to indicate whether an adapter was included in the scope filter. However, the current implementation uses non-pointer Adapter structs, which means they are always present in the JSON. This can lead to state inconsistencies where the 'not included' state is silently dropped or misrepresented during serialization. Following the rule to avoid silently dropping entries during serialization or normalization, consider using pointers for the fields in the Adapters struct and applying omitempty to them to ensure the intended state is preserved on disk.
type Adapters struct {
DynamoDB *Adapter json:"dynamodb,omitempty"
S3 *Adapter json:"s3,omitempty"
Redis *Adapter json:"redis,omitempty"
SQS *Adapter json:"sqs,omitempty"
}References
- Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.
| } | ||
| } | ||
|
|
||
| func TestKeymapReader_AcceptsBlankLinesByPolicy(t *testing.T) { |
There was a problem hiding this comment.
The test name TestKeymapReader_AcceptsBlankLinesByPolicy is misleading because the test actually asserts that blank lines are rejected with ErrInvalidKeymapRecord. This matches the intended strict JSONL behavior described in the comments. The test name should be updated to reflect that blank lines are rejected.
| func TestKeymapReader_AcceptsBlankLinesByPolicy(t *testing.T) { | |
| func TestKeymapReader_RejectsBlankLines(t *testing.T) { |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0d92449a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := json.Unmarshal(line, &rec); err != nil { | ||
| r.err = errors.Wrap(ErrInvalidKeymapRecord, err.Error()) | ||
| return KeymapRecord{}, false, r.err | ||
| } | ||
| if rec.Encoded == "" || rec.Kind == "" { | ||
| r.err = errors.Wrap(ErrInvalidKeymapRecord, "missing encoded or kind") | ||
| return KeymapRecord{}, false, r.err | ||
| } | ||
| return rec, true, nil |
There was a problem hiding this comment.
Reject malformed base64 in KEYMAP records
KeymapReader.Next documents malformed base64 as an invalid record, but it currently only unmarshals JSON and checks encoded/kind, so records with invalid original are accepted and cached by LoadKeymap until a later Original() call fails. This defers corruption detection and can make a damaged KEYMAP appear valid during initial ingest.
Useful? React with 👍 / 👎.
| dec := json.NewDecoder(r) | ||
| dec.DisallowUnknownFields() // surface format drift loudly | ||
| if err := dec.Decode(&m); err != nil { | ||
| return Manifest{}, errors.Wrap(ErrInvalidManifest, err.Error()) |
There was a problem hiding this comment.
Reject trailing bytes after MANIFEST JSON object
ReadManifest decodes exactly one JSON value and returns success without checking for EOF, so inputs like {"...":...}{"extra":1} (or other trailing junk) are accepted. That weakens the manifest's strictness guarantees and can hide file corruption or concatenation errors that should fail validation.
Useful? React with 👍 / 👎.
…tickv into feat/backup-phase0a-keymap-manifest
- Adapters fields are now *Adapter pointers (Gemini #98). Nil -> excluded, non-nil empty -> included-with-no-scopes, non-nil populated -> normal. Previous non-pointer struct collapsed both "excluded" and "included-empty" into the same on-disk shape; gemini's reference (don't silently drop entries during serialization) was the right call. Test TestAdaptersStruct_NilVsEmptyDistinguishedOnDisk covers it. - KeymapReader.Next now validates the base64-encoded `original` field at parse time (Codex P1 #179). Previously the JSON parsed fine and the bad base64 surfaced lazily on Original() — that defers corruption detection past initial ingest. Test TestKeymapReader_RejectsMalformedBase64AtParseTime locks in the parse-time validation. - ReadManifest now rejects trailing bytes via dec.More() (Codex P2 #194). Two manifests concatenated, garbage tails, partial-write artifacts: all surface as ErrInvalidManifest. Tests TestReadManifest_RejectsTrailingBytes and TestReadManifest_RejectsTrailingNonWhitespace. - Test name TestKeymapReader_AcceptsBlankLinesByPolicy renamed to TestKeymapReader_RejectsBlankLines (Gemini #160). The test was already asserting rejection; only the name was misleading.
|
Review feedback addressed in Gemini MEDIUM #98 — Adapters nil-vs-empty must be distinguishable on disk Codex P1 #179 — Validate base64 at parse time Codex P2 #194 — Reject trailing bytes after MANIFEST JSON Gemini MEDIUM #160 — Test name fix @claude review |
|
Claude finished @bootjp's task in 2m 38s —— View job Review of
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3d1e6fdf3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case PhasePhase1LivePinned: | ||
| if m.Source != nil { | ||
| return errors.Wrap(ErrInvalidManifest, "phase1 must not set source") |
There was a problem hiding this comment.
Require live metadata for phase1 manifests
Phase 1 manifests are accepted even when live is completely absent, because validation only rejects source in the phase1 branch. That allows WriteManifest/ReadManifest to succeed for a phase1-live-pinned dump with no pinned read_ts, which breaks the manifest’s ability to describe the consistency point and can cause downstream restore/audit logic to treat an incomplete manifest as valid.
Useful? React with 👍 / 👎.
…to feat/backup-phase0a-keymap-manifest
, round 2) Codex P1 #295: validation only rejected source on phase1 manifests but accepted a phase1 dump with live entirely absent. A phase1 dump's whole point is the cluster-wide read_ts pin recorded under Live; a manifest that omits Live cannot describe its consistency point and downstream restore/audit logic must not silently treat it as valid. validatePhaseSpecific now requires Live != nil AND Live.ReadTS != 0 for phase1 dumps. Tests: TestManifest_Phase1MustSetLive, TestManifest_Phase1RejectsZeroReadTS, TestManifest_Phase1WithLiveAndNonZeroReadTSIsValid.
|
Codex P1 #295 addressed in @claude review |
|
Claude finished @bootjp's task in 2m 29s —— View job Review of
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e4cd7d8ea7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func ReadManifest(r io.Reader) (Manifest, error) { | ||
| var m Manifest | ||
| dec := json.NewDecoder(r) | ||
| dec.DisallowUnknownFields() // surface format drift loudly |
There was a problem hiding this comment.
Allow unknown MANIFEST fields for same-major evolution
ReadManifest rejects any unknown JSON key via DisallowUnknownFields, which makes older readers fail as soon as a newer producer adds an optional field while keeping format_version at the same major value. That breaks the stated same-major compatibility model (optional field additions) and turns otherwise valid manifests into hard read failures during mixed-version operation.
Useful? React with 👍 / 👎.
, round 3) Codex P1 #205 (round 2): DisallowUnknownFields broke the documented same-major minor-evolution path. A newer producer that adds an optional field at the same format_version turns older readers into hard read failures during mixed-version operation, even though both nodes are on the "same major" by contract. The format_version field IS the breaking-change signal. Major bumps are gated by ErrUnsupportedFormatVersion; same-major minor evolution must silently tolerate optional new fields. Removed dec.DisallowUnknownFields() from ReadManifest. Replaced TestReadManifest_RejectsUnknownFields with TestReadManifest_AcceptsUnknownFieldsForSameMajorMinorEvolution. The trailing-bytes guard (dec.More() check) is preserved -- it catches concatenation bugs which is a different concern from optional-field tolerance.
|
Codex P1 #205 (round 2) addressed in
Test @claude review |
|
Claude finished @bootjp's task in 1m 50s —— View job Review of
|
| Item | File | Status |
|---|---|---|
| Stale "whitespace" + "io.Discard" in trailing-bytes comment | manifest.go:219-225 |
Flagged rounds 1+2 |
(Codex P1 #295) issue ref in validatePhaseSpecific |
manifest.go:310-311 |
Flagged round 2 |
(Codex P1, follow-up) in new comment |
manifest.go:206 |
New this round |
Codex P1 #205 (round 2) in test comment |
manifest_test.go:148-150 |
New this round |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f231fe790
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Adapters Adapters `json:"adapters"` | ||
| Exclusions Exclusions `json:"exclusions"` | ||
| ChecksumAlgorithm string `json:"checksum_algorithm"` |
There was a problem hiding this comment.
Reject manifests that omit adapters/exclusions
ReadManifest accepts JSON where adapters and/or exclusions are missing, because these are decoded into non-pointer struct fields and validate() only checks scalar policy fields. In that case the sections silently become zero values and the manifest is treated as valid, which can hide truncated/corrupted MANIFEST files and drop provenance about what scopes/filters were actually used. This should fail validation instead of defaulting.
Useful? React with 👍 / 👎.
Codex P2 #146 round 3: ReadManifest accepted manifests where the adapters and/or exclusions sections were omitted entirely from the JSON, because they were decoded into non-pointer struct fields and validate() only checked scalar policy fields. The empty zero-value sections were treated as valid, hiding truncated or corrupted MANIFEST files and dropping provenance about which scopes/filters the producer used. Manifest.Adapters and Manifest.Exclusions are now *Adapters / *Exclusions pointer types. validateRequiredFields rejects nil for either with ErrInvalidManifest. NewPhase0SnapshotManifest seeds them as &Adapters{} / &Exclusions{} so producers continue to satisfy the contract by default; populating individual scopes reaches in via the now-non-nil pointer. Tests TestReadManifest_RejectsMissingAdapters and TestReadManifest_RejectsMissingExclusions cover the new validation path.
|
Codex P2 #146 (round 3) addressed in
Tests: @claude review |
|
I'll analyze this and get back to you. |

Summary
Stacked on top of #711 (filename encoding). Adds two more foundation pieces of the Phase 0 logical-backup decoder.
KEYMAP.jsonl (
internal/backup/keymap.go){encoded, original (b64url), kind}records.KindSHAFallback— segments rendered as<sha32>__<truncated>KindS3LeafData— S3 path collisions renamed to.elastickv-leaf-dataKindMetaCollision— user object key ending in.elastickv-meta.jsonKeymapWriter: streaming append, JSON encoder configured to skip HTML escapes so user-key bytes round-trip cleanly. Refuses emptyencodedorkindso producer bugs surface loudly.Count()exposed for the "omit empty file" decision.KeymapReader: line-by-line scanner with bounded buffer (1 MiB); blank lines surface asErrInvalidKeymapRecordrather than being silently skipped so truncated dumps are recognised.LoadKeymap: convenience helper that materialises the file as a map (last-wins on duplicates).MANIFEST.json (
internal/backup/manifest.go)docs/design/2026_04_29_proposed_snapshot_logical_decoder.md.CurrentFormatVersion = 1;ReadManifestrefusesformat_version > currentandformat_version == 0(ErrUnsupportedFormatVersion).Live, Phase 1 must not setSource— both validated at write and read time.DisallowUnknownFieldson read so format drift surfaces loudly.MANIFEST.jsonis operator-facing.NewPhase0SnapshotManifestseeds policy fields with the documented defaults.Test plan
go test -race ./internal/backup/...— pass.golangci-lint run ./internal/backup/...— clean.Source/Liveexclusion.Self-review
KeymapReaderreturns sticky errors so partial reads cannot be silently treated as success.KeymapWriter/KeymapReaderare not goroutine-safe (per-scope use); manifest helpers are pure.-raceclean.bufio.Writer(64 KiB) for the JSONL stream; bounded scanner buffer (1 MiB) on read.DisallowUnknownFields+ format-version gate prevent silent drift. The phase discriminator's structural rules are enforced symmetrically at write and read.Stacking
Base branch is
feat/backup-phase0a-filename(PR #711). When that lands, this PR's base will switch tomainautomatically.