backup: DynamoDB encoder for tables and items (Phase 0a)#716
backup: DynamoDB encoder for tables and items (Phase 0a)#716bootjp wants to merge 11 commits intofeat/backup-phase0a-sqsfrom
Conversation
Builds on PR #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder. Snapshot prefixes handled: - !ddb|meta|table|<base64url(table)> -> dynamodb/<table>/_schema.json (DynamoTableSchema proto -> DescribeTable-shaped JSON; cluster- internal fields like key_encoding_version and migrating_from_generation stripped from the public projection because they're not user-visible). - !ddb|item|<base64url(table)>|<gen>|<rest> -> per-item JSON files under dynamodb/<table>/items/. Hash-only tables emit items/<pk>.json; composite-key tables emit items/<pk>/<sk>.json. - !ddb|gsi|... ignored (derivable from items + schema; replaying GSI rows on restore would conflict with the destination's own index maintenance). - !ddb|meta|gen|... ignored (operational counter, not user state). Implementation choices: - Lex order ('i' < 'm') means items arrive before the table schema. Encoder buffers per encoded-table-segment and emits at Finalize once the schema is known, parallel to the SQS encoder's strategy. - Table-segment parsing is unambiguous: base64url alphabet contains no '|', so the first '|' after the prefix is the table/gen separator. No heuristic boundary detection needed. - Item filename derivation reads the hash and range key NAMES from the schema, then looks them up in the item's attributes map. A missing required-key attribute on an item is structural error (it could never have been GetItem-able) and surfaces as ErrDDBInvalidItem on Finalize. - B-attribute (binary) primary keys take EncodeBinarySegment so they cannot collide with hex-shaped string keys; matches the design's "binary keys take b64.<base64url>" rule. - All 10 documented attribute kinds (S, N, B, BOOL, NULL, SS, NS, BS, L, M) are translated to their AWS-DynamoDB-JSON shapes. Empty oneof (a malformed proto value) surfaces as NULL=true so the dump remains deserialisable. - attributeValueToPublic is split into scalar / set / composite helpers so the cyclomatic complexity stays under the package cap. - Bundle mode (--dynamodb-bundle-mode jsonl) is left as a stub: WithBundleJSONL(true) makes Finalize return a clear "not implemented in this PR" error so the master pipeline can surface the capability gap. Per-item layout is the documented default. Tests cover hash-only and composite-key round-trips, binary primary key rendering, orphan items without schema, value-magic rejection on schema and item paths, missing-key-attribute rejection at Finalize, GSI rows ignored, all 10 attribute kinds round-trip through JSON, empty-oneof -> NULL, bundle-mode stub, malformed table-segment key rejection.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a DynamoDB encoder designed to export table schemas and items into a structured filesystem layout for logical backups. The implementation handles the conversion of internal protobuf representations into AWS-compatible JSON formats, supporting various DynamoDB attribute types and secondary indexes. Feedback suggests adopting a fail-fast error handling strategy during the finalization phase when flushing tables and refining the serialization of set attributes to ensure consistent JSON output for empty or nil collections.
| var firstErr error | ||
| for _, st := range d.tables { | ||
| if st.schema == nil { | ||
| d.emitWarn("ddb_orphan_items", | ||
| "encoded_table", st.encoded, | ||
| "buffered_items", len(st.items)) | ||
| continue | ||
| } | ||
| if err := d.flushTable(st); err != nil && firstErr == nil { | ||
| firstErr = err | ||
| } | ||
| } | ||
| return firstErr |
There was a problem hiding this comment.
The current error handling in Finalize correctly implements a lenient path for incomplete entries (orphan items) by emitting a warning instead of failing or dropping them. This aligns with the repository rule to avoid silently dropping entries during serialization to ensure data preservation on disk. For actual flush errors, failing fast is appropriate.
| var firstErr error | |
| for _, st := range d.tables { | |
| if st.schema == nil { | |
| d.emitWarn("ddb_orphan_items", | |
| "encoded_table", st.encoded, | |
| "buffered_items", len(st.items)) | |
| continue | |
| } | |
| if err := d.flushTable(st); err != nil && firstErr == nil { | |
| firstErr = err | |
| } | |
| } | |
| return firstErr | |
| for _, st := range d.tables { | |
| if st.schema == nil { | |
| d.emitWarn("ddb_orphan_items", | |
| "encoded_table", st.encoded, | |
| "buffered_items", len(st.items)) | |
| continue | |
| } | |
| if err := d.flushTable(st); err != nil { | |
| return err | |
| } | |
| } | |
| return nil |
References
- Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | ||
| switch v := av.GetValue().(type) { | ||
| case *pb.DynamoAttributeValue_Ss: | ||
| return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)} | ||
| case *pb.DynamoAttributeValue_Ns: | ||
| return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)} | ||
| case *pb.DynamoAttributeValue_Bs: | ||
| return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)} | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The current implementation of setAttributeValueToPublic can produce null in the JSON output for nil set slices. To avoid state inconsistencies during normalization and ensure entries are correctly preserved, handle nil set slices to produce an empty JSON array [] instead of null.
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | |
| switch v := av.GetValue().(type) { | |
| case *pb.DynamoAttributeValue_Ss: | |
| return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Ns: | |
| return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Bs: | |
| return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)} | |
| } | |
| return nil | |
| } | |
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | |
| switch v := av.GetValue().(type) { | |
| case *pb.DynamoAttributeValue_Ss: | |
| return map[string]any{"SS": append([]string(nil), v.Ss.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Ns: | |
| return map[string]any{"NS": append([]string(nil), v.Ns.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Bs: | |
| return map[string]any{"BS": append([][]byte(nil), v.Bs.GetValues()...)} | |
| } | |
| return nil | |
| } |
References
- Avoid silently dropping entries during serialization or normalization to ensure they are preserved on disk and avoid state inconsistencies.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63bfe8db9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| enc := rest[:idx] | ||
| if _, err := base64.RawURLEncoding.DecodeString(enc); err != nil { | ||
| return "", errors.Wrap(ErrDDBMalformedKey, err.Error()) | ||
| } | ||
| return enc, nil |
There was a problem hiding this comment.
Preserve item generation when grouping DynamoDB rows
Item keys encode !ddb|item|<table>|<generation>|..., but this parser drops everything after the first separator and the encoder buffers rows only by table name. In real clusters, old-generation rows can coexist temporarily (e.g., delete/recreate before async cleanup finishes), and those stale rows will be emitted under the current schema; this can silently restore deleted data or make Finalize fail when primary-key names changed across generations. The backup path should retain generation and filter rows against the active schema generation (and any explicitly supported migration generation) instead of merging all generations.
Useful? React with 👍 / 👎.
…at/backup-phase0a-dynamodb
Three issues, all valid. #237 Codex P1 -- preserve item generation when grouping rows. Item keys carry !ddb|item|<table>|<generation>|... but the prior parser dropped the generation and Finalize merged all generations under the active schema. In real clusters, in-flight delete/recreate cleanup leaves stale-gen rows visible in the snapshot for a window; those rows would silently restore as items under the new schema (potentially resurrecting deleted data, or failing Finalize when PK names changed across generations). Fix: - parseDDBItemKey now returns (encodedTable, generation). - ddbTableState.items -> itemsByGen map[uint64][]*pb.DynamoItem. - flushTable filters by st.schema.GetGeneration() and emits a ddb_stale_generation_items warning carrying the count and active gen so the operator can correlate the orphan window. #182 Gemini -- fail-fast on flush errors. Finalize previously deferred the first error and continued through remaining tables. Real flush errors (out of disk, bad permissions) should surface immediately, not be misattributed to a later table whose flush also fails. Orphan-table warnings still continue (those are informational, not errors). #442 Gemini -- empty set serializes as [] not null. setAttributeValueToPublic now uses make+append so nil/empty SS/NS/BS become "{\"SS\":[]}" rather than "{\"SS\":null}". Downstream tools that distinguish "present-but-empty" from "missing" no longer see a misleading null. Tests: TestDDB_StaleGenerationItemsExcludedAndWarned, TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull, TestDDB_ParseItemKeyExtractsGeneration. Existing fixtures updated to set Generation=1 on schemas (matching the item key gen).
|
3 issues addressed in Codex P1 #237 — Preserve item generation when grouping rows Gemini MEDIUM #182 — Fail fast on flush errors Gemini MEDIUM #442 — Empty set → Tests: @claude review |
|
Claude finished @bootjp's task in 3m 30s —— View job Review:
|
| Finding | Severity |
|---|---|
| Codex P1: generation filtering | ✅ Correctly fixed |
| Gemini #182: fail-fast flush errors | ✅ Correctly fixed |
Gemini #442: empty set → [] |
✅ Correctly fixed |
MigratingFromGeneration excluded without distinct warning |
Low — conservative but silent |
Non-deterministic _schema.json ordering |
Low — readability / diffability |
| Single-stale-gen test only | Low — coverage gap |
The core correctness fixes are solid. The two "Low" findings would improve operator experience but don't affect backup safety. Happy to address any of them if you want them in this PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fdc9b36b6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "stale_count", stale, | ||
| "hint", "stale-gen rows are excluded from the dump; restore would otherwise emit them under the new schema") | ||
| } | ||
| active := st.itemsByGen[activeGen] |
There was a problem hiding this comment.
Include migration source generation when emitting DynamoDB items
flushTable now emits only st.itemsByGen[activeGen], but DynamoDB tables can be in a live migration state where schema.generation is new and schema.migrating_from_generation still contains readable data; the read path explicitly falls back to that source generation during migration (see readLogicalItemAt in adapter/dynamodb.go). In that state, this filter silently drops unmigrated rows from backups, so restoring the dump loses logically visible items if a snapshot is taken mid-migration. Fresh evidence in this revision is the hard filter at this line with no migrating_from_generation inclusion path.
Useful? React with 👍 / 👎.
Two P1 follow-ups from Codex. #100 Codex P1 -- overbroad fast path. The previous len(raw)*3 > 240 fast path tripped for any input longer than 80 bytes, including pure-ASCII keys whose 1:1 encoded form would have fit fine. That converted reversible keys into ErrShaFallbackNeedsKeymap unnecessarily. Replaced with: 1. len(raw) > maxSegmentBytes -> SHA fallback (1:1 lower bound; no encoding can fit). 2. percentEncodeBounded: stream-encodes with an in-loop overflow check so the partial allocation is bounded to maxSegmentBytes even on adversarial inputs that DO need escaping. Returns ("", false) on overflow so the caller takes the SHA path without seeing the partial output. TestEncodeSegment_LongUnreservedASCIIEncodesAsIs locks the correct 200-byte-ASCII round-trip. #146 Codex P1 -- b64. prefix collision. A user STRING key like "b64.foo" was returned as-is by EncodeSegment (all unreserved) and then misclassified by DecodeSegment as a binary segment, decoding the base64 to "foo" instead of round-tripping. EncodeSegment now promotes any input whose percent-encoded form starts with the binary prefix to a real SHA fallback, parallel to the existing SHA-shape collision check, so KEYMAP.jsonl carries the original bytes. TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback covers it. The previous huge-input OOM-guard property (no all-at-once 3*len(raw) allocation) is preserved by percentEncodeBounded; the existing TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion still passes.
…to feat/backup-phase0a-keymap-manifest
, round 2) Codex P1 #295: validation only rejected source on phase1 manifests but accepted a phase1 dump with live entirely absent. A phase1 dump's whole point is the cluster-wide read_ts pin recorded under Live; a manifest that omits Live cannot describe its consistency point and downstream restore/audit logic must not silently treat it as valid. validatePhaseSpecific now requires Live != nil AND Live.ReadTS != 0 for phase1 dumps. Tests: TestManifest_Phase1MustSetLive, TestManifest_Phase1RejectsZeroReadTS, TestManifest_Phase1WithLiveAndNonZeroReadTSIsValid.
…est' into feat/backup-phase0a-dynamodb
… round 2) Codex P1 #227: during a generation migration, schema.Generation is the new gen and schema.MigratingFromGeneration carries the source. The live read path (adapter/dynamodb.go readLogicalItemAt) falls back to the source for items not yet copied. The previous flushTable filtered to only schema.GetGeneration() and silently dropped unmigrated rows from the dump. flushTable now emits items from BOTH generations. Order matters: the source is emitted FIRST, then the active gen, so when both generations carry the same primary key the active-gen content is the one that survives writeFileAtomic's tmp+rename — matching the live code's "new gen wins on conflict" rule. The stale-generation warning was generalised to take an emit-set (totalStaleItemsExcluding); rows in any other generation are counted into the warning so the operator still sees the orphan window. Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted, TestDDB_NewGenerationWinsOverMigrationSourceForSameKey.
|
Codex P1 #227 addressed in
The stale-generation warning was generalised ( Tests: @claude review |
|
Claude finished @bootjp's task in 4m 21s —— View job Claude finished @bootjp's task —— Review:
|
| Finding | Severity |
|---|---|
| Codex P1 #227: include migration-source-gen items | ✅ Correctly fixed |
Emit order matches live readLogicalItemAt preference |
✅ Verified |
parseDDBItemKey correct for ordered encoding |
✅ Verified |
| Same-PK collision resolves to active gen via tmp+rename | ✅ Verified |
| Three-generation scenario (stale + source + active) untested | Low |
Non-deterministic _schema.json output (carried from prior round) |
Low |
Core correctness is solid. The two Low findings are improvements for operator experience/testability, not correctness gaps.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce5b7dac74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rawName, err := base64.RawURLEncoding.DecodeString(encoded) | ||
| if err != nil { |
There was a problem hiding this comment.
Reject empty table segment in DynamoDB meta keys
HandleTableMeta accepts !ddb|meta|table| as valid because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated/malformed key is treated as table name "" instead of returning ErrDDBMalformedKey. In corrupted snapshots this can route schema output into the wrong scope and mask key corruption that Phase 0 is supposed to surface loudly.
Useful? React with 👍 / 👎.
| genEnd := strings.IndexByte(afterTable, '|') | ||
| if genEnd <= 0 { | ||
| return "", 0, errors.Wrapf(ErrDDBMalformedKey, | ||
| "item key missing gen/rest separator: %q", key) | ||
| } |
There was a problem hiding this comment.
Reject item keys with no payload after generation
parseDDBItemKey only checks that a | exists after the generation, but it does not require bytes after that separator, so keys like !ddb|item|<table>|7| are accepted. This lets truncated item keys bypass malformed-key detection and be emitted using only value-side attributes, which can hide snapshot corruption instead of failing fast.
Useful? React with 👍 / 👎.
, round 3) Codex P1 #205 (round 2): DisallowUnknownFields broke the documented same-major minor-evolution path. A newer producer that adds an optional field at the same format_version turns older readers into hard read failures during mixed-version operation, even though both nodes are on the "same major" by contract. The format_version field IS the breaking-change signal. Major bumps are gated by ErrUnsupportedFormatVersion; same-major minor evolution must silently tolerate optional new fields. Removed dec.DisallowUnknownFields() from ReadManifest. Replaced TestReadManifest_RejectsUnknownFields with TestReadManifest_AcceptsUnknownFieldsForSameMajorMinorEvolution. The trailing-bytes guard (dec.More() check) is preserved -- it catches concatenation bugs which is a different concern from optional-field tolerance.
…est' into feat/backup-phase0a-dynamodb
Two Codex P2 strict-validation follow-ups. #117 -- empty table-meta segment. HandleTableMeta accepted `!ddb|meta|table|` (no encoded segment) because base64.RawURLEncoding.DecodeString("") returns empty bytes without error, so the schema would route under the empty table name. Now rejected with ErrDDBMalformedKey before the JSON decode. TestDDB_RejectsTableMetaKeyWithEmptySegment. #303 -- truncated item key. parseDDBItemKey accepted `!ddb|item|<table>|7|` (gen separator present, no primary-key payload). The gen-end check was "genEnd > 0" which a trailing `|` satisfies. Added a follow-up check that genEnd+1 != len(afterTable) so a payload-less key surfaces as ErrDDBMalformedKey rather than emit under value-side attributes only. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
2 Codex P2 strict-validation follow-ups addressed in #117 — Empty table-meta segment. #303 — Truncated item key. @claude review |
|
Claude finished @bootjp's task in 2m 51s —— View job Review:
|
| Finding | Severity |
|---|---|
| Codex P2 #117: empty table-meta segment rejected | ✅ Correctly fixed |
| Codex P2 #303: empty item-key payload rejected | ✅ Correctly fixed |
| Both guards trace correctly for normal + truncated inputs | ✅ Verified |
tableEnd <= 0 + new genEnd+1 guards cover all truncation points |
✅ Verified |
| Three-generation scenario (stale + source + active) untested | Low — carried from round 2 |
Non-deterministic _schema.json output |
Low — carried from round 2 |
This PR is ready to merge from a correctness standpoint. The two Low items are quality-of-life improvements (testability and backup diff stability) that can land in a follow-up.
Summary
Stacked on top of #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder.
Snapshot prefixes handled:
!ddb|meta|table|<base64url(table)>→dynamodb/<table>/_schema.json(DescribeTable-shaped JSON)!ddb|item|<base64url(table)>|<gen>|<rest>→ per-item JSON underdynamodb/<table>/items/items/<pk>.jsonitems/<pk>/<sk>.jsonitems/b64.<base64url>[/...].json(no collision with hex-shaped string keys)!ddb|gsi|*ignored (derivable; replaying would conflict with destination index maintenance)!ddb|meta|gen|*ignored (operational counter)Why buffer + emit at Finalize
Lex order is
'i' < 'm', so items arrive before the schema. Encoder buffers per encoded-table-segment and emits once the schema is known. Item filename derivation reads the hash/range key NAMES from the schema, then looks them up in the item's attributes map.All 10 attribute kinds covered
S / N / B / BOOL / NULL / SS / NS / BS / L / M translated to their AWS-DynamoDB-JSON shapes. Empty oneof (malformed proto value) surfaces as
NULL=trueso the dump remains deserialisable.attributeValueToPublicis split into scalar/set/composite helpers to keep cyclomatic complexity under the package cap.Bundle mode is stubbed
WithBundleJSONL(true)makes Finalize return a clear "not implemented in this PR" error. Per-item layout (the design's documented default) is what this PR delivers.Test plan
go test -race ./internal/backup/...— pass.golangci-lint run ./internal/backup/...— clean.Stacking
Base:
feat/backup-phase0a-sqs(PR #714).