Skip to content

backup: DynamoDB encoder for tables and items (Phase 0a)#716

Open
bootjp wants to merge 11 commits intofeat/backup-phase0a-sqsfrom
feat/backup-phase0a-dynamodb
Open

backup: DynamoDB encoder for tables and items (Phase 0a)#716
bootjp wants to merge 11 commits intofeat/backup-phase0a-sqsfrom
feat/backup-phase0a-dynamodb

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 30, 2026

Summary

Stacked on top of #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder.

Snapshot prefixes handled:

  • !ddb|meta|table|<base64url(table)>dynamodb/<table>/_schema.json (DescribeTable-shaped JSON)
  • !ddb|item|<base64url(table)>|<gen>|<rest> → per-item JSON under dynamodb/<table>/items/
    • Hash-only: items/<pk>.json
    • Composite: items/<pk>/<sk>.json
    • Binary key: items/b64.<base64url>[/...].json (no collision with hex-shaped string keys)
  • !ddb|gsi|* ignored (derivable; replaying would conflict with destination index maintenance)
  • !ddb|meta|gen|* ignored (operational counter)

Why buffer + emit at Finalize

Lex order is 'i' < 'm', so items arrive before the schema. Encoder buffers per encoded-table-segment and emits once the schema is known. Item filename derivation reads the hash/range key NAMES from the schema, then looks them up in the item's attributes map.

All 10 attribute kinds covered

S / N / B / BOOL / NULL / SS / NS / BS / L / M translated to their AWS-DynamoDB-JSON shapes. Empty oneof (malformed proto value) surfaces as NULL=true so the dump remains deserialisable. attributeValueToPublic is split into scalar/set/composite helpers to keep cyclomatic complexity under the package cap.

Bundle mode is stubbed

WithBundleJSONL(true) makes Finalize return a clear "not implemented in this PR" error. Per-item layout (the design's documented default) is what this PR delivers.

Test plan

  • go test -race ./internal/backup/... — pass.
  • golangci-lint run ./internal/backup/... — clean.
  • 11 tests covering hash-only / composite round-trip, binary primary key, orphan items, magic-prefix rejection on schema and item paths, missing-key-attribute rejection, GSI noop, all 10 attribute kinds round-trip, empty-oneof behavior, bundle-mode stub, malformed-key rejection.

Stacking

Base: feat/backup-phase0a-sqs (PR #714).

Builds on PR #714. Adds the DynamoDB encoder for the Phase 0
logical-backup decoder.

Snapshot prefixes handled:
- !ddb|meta|table|<base64url(table)> -> dynamodb/<table>/_schema.json
  (DynamoTableSchema proto -> DescribeTable-shaped JSON; cluster-
  internal fields like key_encoding_version and migrating_from_generation
  stripped from the public projection because they're not user-visible).
- !ddb|item|<base64url(table)>|<gen>|<rest> -> per-item JSON files
  under dynamodb/<table>/items/. Hash-only tables emit
  items/<pk>.json; composite-key tables emit items/<pk>/<sk>.json.
- !ddb|gsi|... ignored (derivable from items + schema; replaying GSI
  rows on restore would conflict with the destination's own index
  maintenance).
- !ddb|meta|gen|... ignored (operational counter, not user state).

Implementation choices:
- Lex order ('i' < 'm') means items arrive before the table schema.
  Encoder buffers per encoded-table-segment and emits at Finalize
  once the schema is known, parallel to the SQS encoder's strategy.
- Table-segment parsing is unambiguous: base64url alphabet contains
  no '|', so the first '|' after the prefix is the table/gen
  separator. No heuristic boundary detection needed.
- Item filename derivation reads the hash and range key NAMES from
  the schema, then looks them up in the item's attributes map. A
  missing required-key attribute on an item is structural error
  (it could never have been GetItem-able) and surfaces as
  ErrDDBInvalidItem on Finalize.
- B-attribute (binary) primary keys take EncodeBinarySegment so they
  cannot collide with hex-shaped string keys; matches the design's
  "binary keys take b64.<base64url>" rule.
- All 10 documented attribute kinds (S, N, B, BOOL, NULL, SS, NS,
  BS, L, M) are translated to their AWS-DynamoDB-JSON shapes.
  Empty oneof (a malformed proto value) surfaces as NULL=true so
  the dump remains deserialisable.
- attributeValueToPublic is split into scalar / set / composite
  helpers so the cyclomatic complexity stays under the package cap.
- Bundle mode (--dynamodb-bundle-mode jsonl) is left as a stub:
  WithBundleJSONL(true) makes Finalize return a clear "not
  implemented in this PR" error so the master pipeline can
  surface the capability gap. Per-item layout is the documented
  default.

Tests cover hash-only and composite-key round-trips, binary primary
key rendering, orphan items without schema, value-magic rejection
on schema and item paths, missing-key-attribute rejection at
Finalize, GSI rows ignored, all 10 attribute kinds round-trip
through JSON, empty-oneof -> NULL, bundle-mode stub, malformed
table-segment key rejection.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8572222e-b176-4e5b-a220-6105f7802441

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-dynamodb

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a DynamoDB encoder designed to export table schemas and items into a structured filesystem layout for logical backups. The implementation handles the conversion of internal protobuf representations into AWS-compatible JSON formats, supporting various DynamoDB attribute types and secondary indexes. Feedback suggests adopting a fail-fast error handling strategy during the finalization phase when flushing tables and refining the serialization of set attributes to ensure consistent JSON output for empty or nil collections.

Comment thread internal/backup/dynamodb.go Outdated
Comment on lines +170 to +182
var firstErr error
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil && firstErr == nil {
firstErr = err
}
}
return firstErr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current error handling in Finalize correctly implements a lenient path for incomplete entries (orphan items) by emitting a warning instead of failing or dropping them. This aligns with the repository rule to avoid silently dropping entries during serialization to ensure data preservation on disk. For actual flush errors, failing fast is appropriate.

Suggested change
var firstErr error
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil && firstErr == nil {
firstErr = err
}
}
return firstErr
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil {
return err
}
}
return nil
References
  1. Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.

Comment on lines +432 to +442
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)}
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of setAttributeValueToPublic can produce null in the JSON output for nil set slices. To avoid state inconsistencies during normalization and ensure entries are correctly preserved, handle nil set slices to produce an empty JSON array [] instead of null.

Suggested change
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)}
}
return nil
}
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string(nil), v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string(nil), v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte(nil), v.Bs.GetValues()...)}
}
return nil
}
References
  1. Avoid silently dropping entries during serialization or normalization to ensure they are preserved on disk and avoid state inconsistencies.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63bfe8db9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/dynamodb.go Outdated
Comment on lines +233 to +237
enc := rest[:idx]
if _, err := base64.RawURLEncoding.DecodeString(enc); err != nil {
return "", errors.Wrap(ErrDDBMalformedKey, err.Error())
}
return enc, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve item generation when grouping DynamoDB rows

Item keys encode !ddb|item|<table>|<generation>|..., but this parser drops everything after the first separator and the encoder buffers rows only by table name. In real clusters, old-generation rows can coexist temporarily (e.g., delete/recreate before async cleanup finishes), and those stale rows will be emitted under the current schema; this can silently restore deleted data or make Finalize fail when primary-key names changed across generations. The backup path should retain generation and filter rows against the active schema generation (and any explicitly supported migration generation) instead of merging all generations.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 19:33
Three issues, all valid.

#237 Codex P1 -- preserve item generation when grouping rows.
Item keys carry !ddb|item|<table>|<generation>|... but the prior
parser dropped the generation and Finalize merged all generations
under the active schema. In real clusters, in-flight delete/recreate
cleanup leaves stale-gen rows visible in the snapshot for a window;
those rows would silently restore as items under the new schema
(potentially resurrecting deleted data, or failing Finalize when
PK names changed across generations).

Fix:
- parseDDBItemKey now returns (encodedTable, generation).
- ddbTableState.items -> itemsByGen map[uint64][]*pb.DynamoItem.
- flushTable filters by st.schema.GetGeneration() and emits a
  ddb_stale_generation_items warning carrying the count and active
  gen so the operator can correlate the orphan window.

#182 Gemini -- fail-fast on flush errors.
Finalize previously deferred the first error and continued through
remaining tables. Real flush errors (out of disk, bad permissions)
should surface immediately, not be misattributed to a later table
whose flush also fails. Orphan-table warnings still continue (those
are informational, not errors).

#442 Gemini -- empty set serializes as [] not null.
setAttributeValueToPublic now uses make+append so nil/empty SS/NS/BS
become "{\"SS\":[]}" rather than "{\"SS\":null}". Downstream tools
that distinguish "present-but-empty" from "missing" no longer see
a misleading null.

Tests: TestDDB_StaleGenerationItemsExcludedAndWarned,
TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull,
TestDDB_ParseItemKeyExtractsGeneration. Existing fixtures updated
to set Generation=1 on schemas (matching the item key gen).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

3 issues addressed in fdc9b36b:

Codex P1 #237 — Preserve item generation when grouping rows
The most important one. parseDDBItemKey now returns (encodedTable, generation); ddbTableState.itemsitemsByGen map[uint64][]*pb.DynamoItem. flushTable filters by st.schema.GetGeneration() and emits a ddb_stale_generation_items warning for stale rows. Stale-gen rows from in-flight delete/recreate cleanup no longer silently restore under the new schema.

Gemini MEDIUM #182 — Fail fast on flush errors
Finalize previously deferred the first error. Now real flush errors (no-disk, bad permissions) surface immediately, not misattributed to a later table. Orphan-table warnings still continue (informational, not errors).

Gemini MEDIUM #442 — Empty set → [] not null
setAttributeValueToPublic uses make + append so nil/empty SS/NS/BS render as {"SS":[]} rather than {"SS":null}. Downstream tools that distinguish "present-but-empty" from "missing" no longer see a misleading null.

Tests: TestDDB_StaleGenerationItemsExcludedAndWarned, TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull, TestDDB_ParseItemKeyExtractsGeneration.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 30s —— View job


Review: fdc9b36 — backup: address review on DynamoDB encoder (PR #716)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

All three review comments are correctly addressed. Details per lens below.


Lens 1 — Data loss

All file writes go through writeFileAtomic (tmp+rename), so no partial-file risk. os.MkdirAll errors propagate correctly. The active := st.itemsByGen[activeGen] map lookup returns nil for a table with no items at activeGen, so range nil is a no-op — a schema-only table emits _schema.json with zero item files, which is correct.

No data loss issues.


Lens 2 — Concurrency / distributed failures

DDBEncoder has no locking. The Handle-then-Finalize lifecycle is documented as sequential, which is the right contract for an offline backup pipeline. No issues.


Lens 3 — Performance

totalItemsAcrossGens and totalStaleItems are O(items) and only called at Finalize time. ensureItemsByGen (line 73) is called on every HandleItem invocation, returning the existing map unchanged after the first call. This is a redundant nil-check on the hot path for high-item-count tables — an inline if st.itemsByGen == nil { st.itemsByGen = make(...) } in HandleItem is slightly cleaner, but this is a nit with no practical impact at the item counts a backup pipeline would see.

No performance issues.


Lens 4 — Data consistency

Codex P1 fix is correct. parseDDBItemKey extracts (encodedTable, generation) unambiguously — base64url doesn't contain |, so the two |-splits after the prefix are reliable separators. flushTable correctly restricts to st.itemsByGen[activeGen] and warns on any gen != activeGen.

One unaddressed edge case worth documenting: MigratingFromGeneration.

DynamoTableSchema has a migrating_from_generation field (proto field 6). During a live key-encoding migration — not a delete/recreate — items may legitimately coexist at migration_from_generation while the migration converges. A backup taken mid-migration would count those items as stale and emit a ddb_stale_generation_items warning but exclude them from the dump. This is conservative (no resurrection risk), but an operator restoring from such a backup would silently lose the in-flight items.

Phase 0a is an offline snapshot decoder and the risk is low (migrations are presumably rare and the operator can re-snapshot after convergence). However, the current comment in flushTable only describes the delete/recreate scenario. It would be worth either:

  • Adding a note that items at migratingFromGeneration are also excluded and will appear in the warning count, or
  • Emitting a separate ddb_migration_in_progress warning when schema.GetMigratingFromGeneration() != 0 so the operator knows to re-snapshot.

Fix this →

Non-deterministic _schema.json output. schemaToPublic iterates s.GetAttributeDefinitions() and s.GetGlobalSecondaryIndexes() (both Go maps) to build the attribute_definitions and global_secondary_indexes slices. The resulting JSON arrays have non-deterministic element order across runs. This doesn't affect correctness (JSON arrays of objects are compared by content, not order), but it makes diff-based backup comparison noisy. A sort.Slice by Name would make output stable and reproducible with zero cost.

Fix this →


Lens 5 — Test coverage

Three new tests look good:

  • TestDDB_StaleGenerationItemsExcludedAndWarned — covers the primary Codex fix path, confirms stale item not written and warning fires.
  • TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull — directly exercises the Gemini Remove .golangci.bck.yaml, update renovate.json #442 fix; uses json.Marshal to verify wire output.
  • TestDDB_ParseItemKeyExtractsGeneration — unit tests the new return value of parseDDBItemKey.

One gap: TestDDB_StaleGenerationItemsExcludedAndWarned has one stale generation (gen=4) and one active (gen=5). A follow-up test with multiple stale generations (e.g., gen=3 and gen=4 both present, active gen=5) would confirm totalStaleItems aggregates across all non-active gens and that stale_count in the warning reflects the combined count.


Summary

Finding Severity
Codex P1: generation filtering ✅ Correctly fixed
Gemini #182: fail-fast flush errors ✅ Correctly fixed
Gemini #442: empty set → [] ✅ Correctly fixed
MigratingFromGeneration excluded without distinct warning Low — conservative but silent
Non-deterministic _schema.json ordering Low — readability / diffability
Single-stale-gen test only Low — coverage gap

The core correctness fixes are solid. The two "Low" findings would improve operator experience but don't affect backup safety. Happy to address any of them if you want them in this PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdc9b36b6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/dynamodb.go Outdated
"stale_count", stale,
"hint", "stale-gen rows are excluded from the dump; restore would otherwise emit them under the new schema")
}
active := st.itemsByGen[activeGen]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include migration source generation when emitting DynamoDB items

flushTable now emits only st.itemsByGen[activeGen], but DynamoDB tables can be in a live migration state where schema.generation is new and schema.migrating_from_generation still contains readable data; the read path explicitly falls back to that source generation during migration (see readLogicalItemAt in adapter/dynamodb.go). In that state, this filter silently drops unmigrated rows from backups, so restoring the dump loses logically visible items if a snapshot is taken mid-migration. Fresh evidence in this revision is the hard filter at this line with no migrating_from_generation inclusion path.

Useful? React with 👍 / 👎.

bootjp added 5 commits April 30, 2026 19:58
Two P1 follow-ups from Codex.

#100 Codex P1 -- overbroad fast path.
The previous len(raw)*3 > 240 fast path tripped for any input
longer than 80 bytes, including pure-ASCII keys whose 1:1 encoded
form would have fit fine. That converted reversible keys into
ErrShaFallbackNeedsKeymap unnecessarily. Replaced with:
  1. len(raw) > maxSegmentBytes -> SHA fallback (1:1 lower bound;
     no encoding can fit).
  2. percentEncodeBounded: stream-encodes with an in-loop overflow
     check so the partial allocation is bounded to maxSegmentBytes
     even on adversarial inputs that DO need escaping. Returns
     ("", false) on overflow so the caller takes the SHA path
     without seeing the partial output.
TestEncodeSegment_LongUnreservedASCIIEncodesAsIs locks the
correct 200-byte-ASCII round-trip.

#146 Codex P1 -- b64. prefix collision.
A user STRING key like "b64.foo" was returned as-is by
EncodeSegment (all unreserved) and then misclassified by
DecodeSegment as a binary segment, decoding the base64 to "foo"
instead of round-tripping. EncodeSegment now promotes any input
whose percent-encoded form starts with the binary prefix to a
real SHA fallback, parallel to the existing SHA-shape collision
check, so KEYMAP.jsonl carries the original bytes.
TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback
covers it.

The previous huge-input OOM-guard property (no all-at-once
3*len(raw) allocation) is preserved by percentEncodeBounded; the
existing TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion
still passes.
, round 2)

Codex P1 #295: validation only rejected source on phase1 manifests
but accepted a phase1 dump with live entirely absent. A phase1
dump's whole point is the cluster-wide read_ts pin recorded under
Live; a manifest that omits Live cannot describe its consistency
point and downstream restore/audit logic must not silently treat
it as valid.

validatePhaseSpecific now requires Live != nil AND
Live.ReadTS != 0 for phase1 dumps. Tests:
TestManifest_Phase1MustSetLive,
TestManifest_Phase1RejectsZeroReadTS,
TestManifest_Phase1WithLiveAndNonZeroReadTSIsValid.
… round 2)

Codex P1 #227: during a generation migration, schema.Generation
is the new gen and schema.MigratingFromGeneration carries the
source. The live read path (adapter/dynamodb.go readLogicalItemAt)
falls back to the source for items not yet copied. The previous
flushTable filtered to only schema.GetGeneration() and silently
dropped unmigrated rows from the dump.

flushTable now emits items from BOTH generations. Order matters:
the source is emitted FIRST, then the active gen, so when both
generations carry the same primary key the active-gen content is
the one that survives writeFileAtomic's tmp+rename — matching the
live code's "new gen wins on conflict" rule.

The stale-generation warning was generalised to take an emit-set
(totalStaleItemsExcluding); rows in any other generation are
counted into the warning so the operator still sees the orphan
window.

Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted,
TestDDB_NewGenerationWinsOverMigrationSourceForSameKey.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

Codex P1 #227 addressed in ce5b7dac. During a generation migration, schema.Generation is the new gen and schema.MigratingFromGeneration carries the source. The live read path (adapter/dynamodb.go readLogicalItemAt) falls back to the source for items not yet copied; my previous filter to activeGen only would silently drop unmigrated rows.

flushTable now emits from both generations. The source is emitted first, then the active gen, so when both carry the same primary key the active-gen content is the one that survives writeFileAtomic's tmp+rename — matching the live code's "new gen wins on conflict" rule.

The stale-generation warning was generalised (totalStaleItemsExcluding) so rows in any OTHER generation are still counted and surfaced.

Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted (both gens emitted), TestDDB_NewGenerationWinsOverMigrationSourceForSameKey (collision resolves to new-gen content).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 21s —— View job


Claude finished @bootjp's task ——


Review: ce5b7da — backup: emit migrating-from-generation items in DynamoDB dump (PR #716, round 2)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Read adapter/dynamodb.go (readLogicalItemAt) for alignment check
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Codex P1 #227 is correctly addressed. Details per lens below.


Lens 1 — Data loss

writeFileAtomic is tmp+rename, so two writes to the same path are race-free and the last writer wins on disk. The emit order (source gen first, active gen last) ensures the active gen content is the survivor on any key collision — matching the live read preference exactly.

One subtlety verified: parseDDBItemKey is correct for both legacy and ordered key encodings. The ordered encoding (dynamoItemKey) puts binary key data after the second |, but parseDDBItemKey stops at the second | — it never touches the binary suffix. The generation is decimal digits, so the second | in the key is unambiguous.

No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle-then-Finalize contract, no locking needed. No issues.


Lens 3 — Performance

totalStaleItemsExcluding allocates a small map[uint64]struct{} with at most 2 entries (activeGen + migrationSourceGen) per flushTable call. Cost is O(distinct generations), not O(items). No issues.


Lens 4 — Data consistency

The emit order correctly mirrors readLogicalItemAt:

readLogicalItemAt:  try activeGen → found? return it. else try sourceGen.
flushTable:         emit sourceGen first → emit activeGen last → activeGen wins via rename.

Both paths agree: new gen wins on conflict.

migrationSourceSchema() alignment check. The live migrationSourceSchema() sets KeyEncodingVersion: 0 for the source gen, but copies PrimaryKey and AttributeDefinitions unchanged from the active schema. Since flushTable uses hashKey/rangeKey from the active schema (st.schema.GetPrimaryKey()), and both gens share the same PK attribute names, writeDDBItem correctly derives file paths for items from either gen.

emitOrder edge cases correct:

  • migrationSourceGen == 0 → no-migration state → emitOrder = [activeGen] only ✅
  • migrationSourceGen == activeGen → guarded by migrationSourceGen != activeGen → no duplicate ✅

No data consistency issues.


Lens 5 — Test coverage

Two new tests cover the core fix correctly:

  • TestDDB_MigrationSourceGenerationItemsAreEmitted — distinct PKs in each gen; verifies both files land on disk ✅
  • TestDDB_NewGenerationWinsOverMigrationSourceForSameKey — same PK in both gens; verifies v.S == "new-version" survives ✅

One coverage gap — three-generation scenario:

TestDDB_StaleGenerationItemsExcludedAndWarned uses a schema with MigratingFromGeneration == 0, so emitOrder == [5] and the stale item at gen=4 is excluded. No test exercises the case where all three are present: migrationSourceGen (included), activeGen (included), and an even-older gen (stale + warned). A table in mid-migration that also has cleanup debris from a prior delete/recreate would hit this path. totalStaleItemsExcluding is correct by inspection, but this scenario is untested.

Fix this →

Carried over from prior round — still unaddressed:

schemaToPublic iterates GetAttributeDefinitions() and GetGlobalSecondaryIndexes() (Go maps) to build the output slices. The _schema.json arrays have non-deterministic element order across runs, making diff-based backup comparison noisy.

Fix this →


Summary

Finding Severity
Codex P1 #227: include migration-source-gen items ✅ Correctly fixed
Emit order matches live readLogicalItemAt preference ✅ Verified
parseDDBItemKey correct for ordered encoding ✅ Verified
Same-PK collision resolves to active gen via tmp+rename ✅ Verified
Three-generation scenario (stale + source + active) untested Low
Non-deterministic _schema.json output (carried from prior round) Low

Core correctness is solid. The two Low findings are improvements for operator experience/testability, not correctness gaps.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce5b7dac74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +116 to +117
rawName, err := base64.RawURLEncoding.DecodeString(encoded)
if err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty table segment in DynamoDB meta keys

HandleTableMeta accepts !ddb|meta|table| as valid because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated/malformed key is treated as table name "" instead of returning ErrDDBMalformedKey. In corrupted snapshots this can route schema output into the wrong scope and mask key corruption that Phase 0 is supposed to surface loudly.

Useful? React with 👍 / 👎.

Comment on lines +299 to +303
genEnd := strings.IndexByte(afterTable, '|')
if genEnd <= 0 {
return "", 0, errors.Wrapf(ErrDDBMalformedKey,
"item key missing gen/rest separator: %q", key)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject item keys with no payload after generation

parseDDBItemKey only checks that a | exists after the generation, but it does not require bytes after that separator, so keys like !ddb|item|<table>|7| are accepted. This lets truncated item keys bypass malformed-key detection and be emitted using only value-side attributes, which can hide snapshot corruption instead of failing fast.

Useful? React with 👍 / 👎.

bootjp added 3 commits April 30, 2026 20:09
, round 3)

Codex P1 #205 (round 2): DisallowUnknownFields broke the
documented same-major minor-evolution path. A newer producer
that adds an optional field at the same format_version turns
older readers into hard read failures during mixed-version
operation, even though both nodes are on the "same major" by
contract.

The format_version field IS the breaking-change signal. Major
bumps are gated by ErrUnsupportedFormatVersion; same-major
minor evolution must silently tolerate optional new fields.

Removed dec.DisallowUnknownFields() from ReadManifest. Replaced
TestReadManifest_RejectsUnknownFields with
TestReadManifest_AcceptsUnknownFieldsForSameMajorMinorEvolution.
The trailing-bytes guard (dec.More() check) is preserved -- it
catches concatenation bugs which is a different concern from
optional-field tolerance.
Two Codex P2 strict-validation follow-ups.

#117 -- empty table-meta segment.
HandleTableMeta accepted `!ddb|meta|table|` (no encoded segment)
because base64.RawURLEncoding.DecodeString("") returns empty bytes
without error, so the schema would route under the empty table
name. Now rejected with ErrDDBMalformedKey before the JSON decode.
TestDDB_RejectsTableMetaKeyWithEmptySegment.

#303 -- truncated item key.
parseDDBItemKey accepted `!ddb|item|<table>|7|` (gen separator
present, no primary-key payload). The gen-end check was
"genEnd > 0" which a trailing `|` satisfies. Added a follow-up
check that genEnd+1 != len(afterTable) so a payload-less key
surfaces as ErrDDBMalformedKey rather than emit under value-side
attributes only. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

2 Codex P2 strict-validation follow-ups addressed in 9029addb:

#117 — Empty table-meta segment. HandleTableMeta accepted !ddb|meta|table| because base64.RawURLEncoding.DecodeString("") returns []byte{} without error. Now rejected with ErrDDBMalformedKey before the JSON decode. TestDDB_RejectsTableMetaKeyWithEmptySegment.

#303 — Truncated item key. parseDDBItemKey accepted !ddb|item|<table>|7| (gen separator present but no primary-key payload). Added a genEnd+1 != len(afterTable) guard so a payload-less key surfaces as ErrDDBMalformedKey. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 51s —— View job


Review: 9029add — backup: reject truncated DynamoDB keys (PR #716, round 3)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Both P2 fixes are correct. Details per lens below.


Lens 1 — Data loss

Fix #1 — empty table-meta segment (line 116–122):

stripPrefixSegment("!ddb|meta|table|", "!ddb|meta|table|") returns ("", nil) — no error — because the function simply returns string(key[len(prefix):]). Without the new guard, base64.RawURLEncoding.DecodeString("") also returns ([]byte{}, nil), so the table name resolves to "". EncodeSegment([]byte("")) returns "", so the flush path becomes:

filepath.Join(root, "dynamodb", "", "_schema.json")
  → root/dynamodb/_schema.json

That file lands at the dynamodb/ root, silently overwriting any other _schema.json written there by the same path on a different table (impossible for valid tables, but the behavior is ambiguous garbage). The new guard correctly catches this before any decode or unmarshal. Fix is correct.

Fix #2 — empty primary-key payload (line 315–324):

Without the guard, !ddb|item|<t>|7| passes all prior checks, buffers a DynamoItem from the value, and defers to Finalize. At flush time, writeDDBItem looks up the hash-key attribute in the item's attribute map and derives a path from the value — never from the key payload — so the truncated key itself doesn't produce a file at a wrong path. The real risk is the diagnostic one: a corrupted key silently produces output shaped entirely by the value side, masking the key corruption. The guard surfaces it at HandleItem time with a clear ErrDDBMalformedKey. Fix is correct.

No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle-then-Finalize contract, no concurrency implications. No issues.


Lens 3 — Performance

Both new checks are O(1) comparisons (encoded == "" and genEnd+1 == len(afterTable)). No impact.


Lens 4 — Data consistency

Trace through both test cases:

TestDDB_RejectsTableMetaKeyWithEmptySegment:

  • stripPrefixSegment("!ddb|meta|table|", "!ddb|meta|table|")""
  • encoded == "" guard fires → ErrDDBMalformedKey

TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload (key = !ddb|item|dA|7|):

  • rest = "dA|7|"tableEnd = 2enc = "dA" → valid base64url
  • afterTable = "7|"genEnd = 1gen = 7
  • genEnd+1 == len(afterTable)2 == 2ErrDDBMalformedKey

One subtle point worth noting: tableEnd <= 0 (existing guard, line 297) catches the case where an item key has an empty encoded-table segment (e.g., !ddb|item||7|...), since IndexByte returns 0 for a leading |. The new guard for empty item-key payload is complementary: it catches the case where the table and gen are present but the payload after the final | is empty. Together they cover all three truncation points in the <table>|<gen>|<payload> structure.

No data consistency issues.


Lens 5 — Test coverage

Both new tests are correct and well-targeted:

  • TestDDB_RejectsTableMetaKeyWithEmptySegment — passes the bare prefix as the key, exercises the encoded == "" guard, asserts errors.Is(err, ErrDDBMalformedKey)
  • TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload — manually constructs !ddb|item|dA|7|, exercises the genEnd+1 == len(afterTable) guard, asserts errors.Is(err, ErrDDBMalformedKey)

Carried from prior rounds — still unaddressed (both Low):

  1. Three-generation scenario untested. totalStaleItemsExcluding correctly handles multiple non-included gens by inspection, but no test exercises stale_count when more than one generation is stale (e.g., schema gen=7, source gen=6, stale item at gen=5). TestDDB_StaleGenerationItemsExcludedAndWarned covers only the one-stale-gen path.

    Fix this →

  2. Non-deterministic _schema.json output. schemaToPublic iterates Go maps (GetAttributeDefinitions() and GetGlobalSecondaryIndexes()), so attribute_definitions and global_secondary_indexes arrays have random element order across runs. Diff-based backup verification will see spurious diffs.

    Fix this →


Summary

Finding Severity
Codex P2 #117: empty table-meta segment rejected ✅ Correctly fixed
Codex P2 #303: empty item-key payload rejected ✅ Correctly fixed
Both guards trace correctly for normal + truncated inputs ✅ Verified
tableEnd <= 0 + new genEnd+1 guards cover all truncation points ✅ Verified
Three-generation scenario (stale + source + active) untested Low — carried from round 2
Non-deterministic _schema.json output Low — carried from round 2

This PR is ready to merge from a correctness standpoint. The two Low items are quality-of-life improvements (testability and backup diff stability) that can land in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant