Skip to content

backup: add Redis hash encoder#725

Open
bootjp wants to merge 8 commits intofeat/backup-phase0a-redis-simplefrom
feat/backup-phase0a-redis-hash
Open

backup: add Redis hash encoder#725
bootjp wants to merge 8 commits intofeat/backup-phase0a-redis-simplefrom
feat/backup-phase0a-redis-hash

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 1, 2026

Summary

  • First of the Phase 0a wide-column encoders. Translates !hs|meta| + !hs|fld| records on the live store into one hashes/<encoded>.json file per Redis hash, per the design at docs/design/2026_04_29_proposed_snapshot_logical_decoder.md:327.
  • Stacked on top of backup: Redis simple-type encoders (strings, HLL, TTL routing) #713 (the strings/HLL/TTL encoder); reuses RedisDB, the SHA-fallback KEYMAP path, and the dirsCreated cache.
  • TTL is folded inline into expire_at_ms (not a sidecar) so a restorer can replay each hash in one shot. HandleTTL adds a redisKindHash route.
  • Field values reuse the UTF-8-or-base64 envelope from PR backup: SQS encoder for queue meta and messages.jsonl (Phase 0a) #714's SQS body for binary safety. Field names percent-encode (EncodeSegment) when not valid UTF-8 to keep JSON object keys as strings.
  • Empty hashes still emit a file (HLEN==0 is observable). Length mismatches between meta.Len and observed !hs|fld| keys surface as redis_hash_length_mismatch warnings.
  • Output is sorted by user key + field name for deterministic dumps (matches the round-9 sort pattern from backup: DynamoDB encoder for tables and items (Phase 0a) #716).

Self-review (per CLAUDE.md "five lenses")

  1. Data loss — empty hashes preserved; declared/observed length mismatch warned; field value bytes preserved verbatim via base64 envelope when not UTF-8; SHA-fallback paths feed KEYMAP.
  2. ConcurrencyRedisDB is single-writer per the existing contract; the new hashes map follows the same per-key state model used by kindByKey/inlineTTLEmitted.
  3. Performance — fields buffered per key (Redis hashes are typically small); flushHashes runs once at Finalize. Sort is O(n log n) over user keys + per-hash field names — acceptable at dump cadence.
  4. Data consistencykindByKey registration on both meta and field arrival prevents TTL mis-routing across snapshot record-ordering scenarios.
  5. Test coverage — 8 sub-tests: round-trip, empty hash, inline TTL routing, length mismatch warning, binary value envelope, malformed meta, truncated field key, SHA-fallback keymap.

Test plan

  • go test -race ./internal/backup/... — clean.
  • GOOS=windows GOARCH=amd64 go build ./internal/backup/... — clean.
  • GOOS=js GOARCH=wasm go build ./internal/backup/... — clean.
  • golangci-lint run — clean.

First of the Phase 0a wide-column encoders (after the redis_string.go
strings/HLL/TTL work in PR #713). Hashes follow a meta+per-field
key shape on the live store
(`!hs|meta|<len><userKey>` → `[Len(8)]`, `!hs|fld|<len><userKey><field>`
→ field bytes; see store/hash_helpers.go), so the encoder buffers
each user key's fields in memory and flushes one
`hashes/<encoded>.json` file per hash at Finalize.

Output schema matches the design
(docs/design/2026_04_29_proposed_snapshot_logical_decoder.md:327):

	{
	  "format_version": 1,
	  "fields": {"name": "alice"},
	  "expire_at_ms": null
	}

Cross-cutting decisions:

  - TTL is folded INLINE into expire_at_ms (not a separate
    sidecar) so a restorer can replay the hash in one shot.
    HandleTTL routes redisKindHash records into the hash state
    rather than the strings_ttl.jsonl path.
  - Field VALUES use the same UTF-8-or-base64 envelope introduced
    for the SQS message body in PR #714: valid UTF-8 → plain JSON
    string; non-UTF-8 → `{"base64":"..."}`. Field NAMES that are
    not valid UTF-8 are percent-encoded (EncodeSegment) for the
    JSON object key — JSON object keys must be strings.
  - Empty hashes (HLEN==0) still emit a file because their
    existence is observable to clients (HEXISTS, HLEN). Mismatched
    declared-vs-observed length surfaces as
    `redis_hash_length_mismatch` warning.
  - SHA-fallback user keys feed back through the existing
    recordIfFallback path so KEYMAP.jsonl still reverses
    non-reversible filenames.
  - Output is sorted by user key + field name for deterministic
    byte-identical dumps across runs (matches the round-9 sort
    pattern from #716 schemaToPublic).

Tests cover: round-trip, empty hash, inline TTL routing, length
mismatch warning, binary value envelope, malformed meta, truncated
field key, SHA-fallback keymap.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0861e12-1a78-4734-8826-fa6a46e69490

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-redis-hash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Redis hash backup support, including handlers for hash metadata and fields, and logic to flush accumulated state to JSON files. The implementation handles binary-safe field values and provides deterministic output. Review feedback identifies a potential panic on 32-bit architectures due to unsafe integer conversion and suggests refactoring the hashState helper to centralize key-kind tracking and reduce redundant string allocations.

Comment thread internal/backup/redis_hash.go Outdated
Comment on lines +129 to +133
ukLen := int(binary.BigEndian.Uint32(b[:hashUserKeyLenSize])) //nolint:gosec // bounded by len(b)
if len(b) < hashUserKeyLenSize+ukLen {
return nil, false
}
return b[hashUserKeyLenSize : hashUserKeyLenSize+ukLen], true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On 32-bit architectures, converting a uint32 to int can result in a negative value if the high bit is set (e.g., for lengths > 2GB). This would bypass the subsequent length check and cause a panic when slicing b. Using uint64 for the comparison ensures safety across all architectures.

Suggested change
ukLen := int(binary.BigEndian.Uint32(b[:hashUserKeyLenSize])) //nolint:gosec // bounded by len(b)
if len(b) < hashUserKeyLenSize+ukLen {
return nil, false
}
return b[hashUserKeyLenSize : hashUserKeyLenSize+ukLen], true
ukLen := binary.BigEndian.Uint32(b[:hashUserKeyLenSize])
if uint64(len(b)) < uint64(hashUserKeyLenSize)+uint64(ukLen) {
return nil, false
}
return b[hashUserKeyLenSize : hashUserKeyLenSize+int(ukLen)], true

Comment on lines +63 to +67
st := r.hashState(userKey)
st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design
st.metaSeen = true
r.kindByKey[string(userKey)] = redisKindHash
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update to r.kindByKey can be centralized within the hashState helper to avoid redundancy and ensure consistency across all hash-related handlers.

Suggested change
st := r.hashState(userKey)
st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design
st.metaSeen = true
r.kindByKey[string(userKey)] = redisKindHash
return nil
st := r.hashState(userKey)
st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design
st.metaSeen = true
return nil
References
  1. Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

Comment on lines +81 to +84
st := r.hashState(userKey)
st.fields[string(fieldName)] = bytes.Clone(value)
r.kindByKey[string(userKey)] = redisKindHash
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update to r.kindByKey is redundant here if centralized in hashState.

Suggested change
st := r.hashState(userKey)
st.fields[string(fieldName)] = bytes.Clone(value)
r.kindByKey[string(userKey)] = redisKindHash
return nil
st := r.hashState(userKey)
st.fields[string(fieldName)] = bytes.Clone(value)
return nil
References
  1. Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

Comment on lines +88 to +95
func (r *RedisDB) hashState(userKey []byte) *redisHashState {
if st, ok := r.hashes[string(userKey)]; ok {
return st
}
st := &redisHashState{fields: make(map[string][]byte)}
r.hashes[string(userKey)] = st
return st
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Centralizing the kindByKey update here improves maintainability and avoids repeated map writes in the handlers. Additionally, caching the string conversion of userKey avoids redundant allocations when the state is not found.

Suggested change
func (r *RedisDB) hashState(userKey []byte) *redisHashState {
if st, ok := r.hashes[string(userKey)]; ok {
return st
}
st := &redisHashState{fields: make(map[string][]byte)}
r.hashes[string(userKey)] = st
return st
}
func (r *RedisDB) hashState(userKey []byte) *redisHashState {
ukStr := string(userKey)
if st, ok := r.hashes[ukStr]; ok {
return st
}
st := &redisHashState{fields: make(map[string][]byte)}
r.hashes[ukStr] = st
r.kindByKey[ukStr] = redisKindHash
return st
}
References
  1. Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72e2b75a61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/redis_hash.go Outdated
Comment on lines +206 to +209
jsonName := name
if !utf8.ValidString(name) {
jsonName = EncodeSegment([]byte(name))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disambiguate encoded and raw hash field names

This conversion only percent-encodes non-UTF8 field names while leaving UTF-8 names unchanged, so distinct Redis fields can collapse to the same JSON key (for example, a literal field name %FF and a binary field name 0xFF). In that case fields[jsonName] overwrites one value, silently dropping data from hashes that contain both forms; Redis hash field names are binary-safe, so this is a valid input shape.

Useful? React with 👍 / 👎.

Comment thread internal/backup/redis_hash.go Outdated
for _, name := range names {
jsonName := name
if !utf8.ValidString(name) {
jsonName = EncodeSegment([]byte(name))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid SHA-fallback when encoding hash field names

EncodeSegment is a filename encoder with a 240-byte limit and SHA fallback; applying it to non-UTF8 field names can emit an irreversible fallback token for long names. Because hash field names are stored only as JSON object keys and no KEYMAP is written for per-field names, restoring such hashes cannot reconstruct the original field-name bytes, causing silent key corruption for long binary field names.

Useful? React with 👍 / 👎.

…tion (PR #725, round 1)

Codex round 12 raised two P1 findings on commit 72e2b75; Gemini
raised one high and three mediums alongside.

P1: Hash field name encoding via JSON object key was lossy.
  - Distinct fields could collapse to the same JSON key: a UTF-8
    literal `%FF` and a single binary byte `0xFF` both ended up
    as `%FF` after the EncodeSegment-on-non-UTF-8 fallback,
    silently dropping data on hashes with both forms.
  - Long binary field names (>240 bytes) hit EncodeSegment's
    SHA fallback which is irreversible at this layer (no
    per-field KEYMAP), so restore could not reconstruct the
    original bytes.
Fix: change `fields` to a JSON array of typed `{name, value}`
records. Both name and value go through the same UTF-8-or-base64
envelope (marshalRedisBinaryValue, mirrors PR #714's SQS body),
so arbitrary binary bytes round-trip without collisions and
without SHA fallback.

High (Gemini): parseUserKeyLenPrefix used `int(uint32)` which
wraps to negative on 32-bit when the high bit is set, bypassing
the bounds check and causing a slice panic. Compare in `uint64`
space before converting.

Medium (Gemini, ×3): centralise `kindByKey` registration in the
hashState helper so meta/field/TTL paths agree on the kind in
one place. Audited callers — HandleHashMeta, HandleHashField, and
HandleTTL's `case redisKindHash:` arm — all reach hashState via
the same userKey identity, and the TTL caller already knows the
kind so the now-implicit kindByKey write is idempotent. No other
callers per `grep -n "r\.hashState(" internal/backup/`.

Tests:
  - TestRedisDB_HashRoundTripBasic updated to assert the array
    shape via new hashFieldArray / hashFieldByName helpers.
  - TestRedisDB_HashEmptyHashStillEmitsFile, _BinaryFieldValue
    updated likewise.
  - NEW TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives
    the exact Codex collision pair (UTF-8 `%FF` + binary 0xFF)
    and asserts both names survive distinctly with their
    expected types.

Verified: linux race tests, GOOS=windows, GOOS=js wasm, lint —
all clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 1 fixes for Codex P1 round 12 + Gemini high/medium on commit 72e2b75 (commit cacd4489):

P1 — hash field name encoding was lossy: distinct Redis fields could collapse to the same JSON object key (UTF-8 literal %FF and binary 0xFF both percent-encoded to %FF); >240-byte binary names hit EncodeSegment's SHA fallback which is irreversible at this layer. Fixed by changing fields to a JSON array of typed {name, value} records — both name and value use the UTF-8-or-base64 envelope from PR #714's SQS body, so arbitrary binary bytes round-trip without collision. Test: TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives the exact collision pair.

High (Gemini) — 32-bit overflow in length parser: parseUserKeyLenPrefix used int(uint32) which wraps to negative on 32-bit architectures when the high bit is set, bypassing the bounds check. Now compares in uint64 space before converting.

Medium (Gemini ×3) — kindByKey centralisation: r.kindByKey[uk] = redisKindHash was duplicated across HandleHashMeta and HandleHashField. Centralised in hashState helper. Per the loop's caller-audit rule: I grep'd r\.hashState( across internal/backup/ — three callers (HandleHashMeta, HandleHashField, HandleTTL's redisKindHash arm) all reach hashState via the same userKey identity, and the TTL caller already knows the kind so the now-implicit write is idempotent. Documented the audit rationale in the helper's comment.

go test -race, GOOS=windows GOARCH=amd64 go build, GOOS=js GOARCH=wasm go build, and golangci-lint all clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 2m 54s —— View job


Review — Round 1 fixes (commit cacd4489)

  • Read changed files
  • Review P1 fix: hash field name array encoding
  • Review High fix: 32-bit overflow in length parser
  • Review Medium fix: kindByKey centralisation
  • Check for new issues introduced
  • Post final review

All three fixes are correct and well-targeted. Notes below in order of importance.


P1 — Array-of-records output shape ✅

The switch from a JSON-object fields map to []hashFieldRecord correctly closes both reported failure modes:

  • The UTF-8 literal %FF and the binary byte 0xFF now occupy separate array entries; neither can shadow the other.
  • Long binary field names no longer go through EncodeSegment/SHA-fallback at all — the base64 envelope carries the raw bytes directly.

marshalRedisBinaryValue is clean. The internal redisHashState.fields map[string][]byte still uses raw byte-sequences as Go string keys, which is correct (Go string comparison is byte-order; string([]byte{0xff}) != string([]byte("%FF"))). The path from HandleHashField → fields map → marshalHashJSON → marshalRedisBinaryValue is lossless.

TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives the exact collision pair. base64URLEncode in the test matches base64.RawURLEncoding bit-for-bit (single-byte 0xff"_w" confirmed).

Nit (no action required): sort.Strings(names) sorts binary field names by byte order. Deterministic and correct, though worth noting in the format doc that the sort key is the raw byte sequence, not any user-visible ordering.


High — 32-bit bounds check in parseUserKeyLenPrefix

if uint64(len(b)) < uint64(hashUserKeyLenSize)+uint64(ukLen) {
    return nil, false
}
return b[hashUserKeyLenSize : hashUserKeyLenSize+int(ukLen)], true //nolint:gosec // bounded above

The uint64 comparison is correct. After the check passes, on a 32-bit platform int is 32-bit; but since uint64(len(b)) >= 4 + uint64(ukLen) and len(b) is a non-negative int, we have ukLen <= len(b) - 4 <= MaxInt32 - 4, so int(ukLen) is non-negative. The nolint comment explains the invariant. ✅


Medium — kindByKey centralised in hashState

The audit comment documents the three callers and explains why the idempotent write from the HandleTTL path is safe. No new callers are possible outside the package (unexported method). ✅


Two observations worth noting for the follow-up

1. RedisHashMetaDeltaPrefix ordering risk (nit)

RedisHashMetaPrefix = "!hs|meta|" and RedisHashMetaDeltaPrefix = "!hs|meta|d|" — the delta prefix has the meta prefix as a strict prefix. A dispatcher that checks HasPrefix(key, RedisHashMetaPrefix) before HasPrefix(key, RedisHashMetaDeltaPrefix) would route delta-meta keys to HandleHashMeta, where parseHashMetaKey would strip !hs|meta| and try to parse d|<…> as a 4-byte length — either silently producing garbage or returning ErrRedisInvalidHashKey. The constant is unused today (no delta handler yet), but the dispatcher registration order will matter when it lands. Worth checking that the dispatcher evaluates RedisHashMetaDeltaPrefix before RedisHashMetaPrefix.

2. TTL-before-meta snapshot ordering (minor, pre-existing)

TestRedisDB_HashTTLInlinedFromScanIndex puts TTL after meta+field. If a snapshot ever places !redis|ttl|<K> before the corresponding !hs|meta|<K> (e.g. a scanner ordered by key prefix), kindByKey[K] is still redisKindUnknown at TTL time, and the expiry is counted as an orphan and silently dropped from the hash JSON. This is pre-existing behaviour and the snapshot engine probably guarantees consistent ordering, but a test for the reversed order would lock down the behaviour explicitly (or expose a need for a two-pass approach). Not a blocker here.


Verdict: All three reviewer-identified defects are correctly fixed. The test coverage for the P1 scenario is solid. Ready to merge once the pre-existing ordering observation above is acknowledged.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4646b9f7f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/redis_hash.go Outdated
Comment on lines +76 to +78
if len(fieldName) == 0 {
return cockroachdberr.Wrapf(ErrRedisInvalidHashKey,
"empty field name in key %q", key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Permit empty hash field names

HandleHashField currently treats a zero-length field name as malformed, but Redis hash field names are binary-safe and may legitimately be empty. In this case a valid key like !hs|fld|<len><userKey> (with no trailing field bytes) will cause backup decoding to error out instead of emitting the hash, so backups can fail on real data created via HSET with an empty field name.

Useful? React with 👍 / 👎.

bootjp added 2 commits May 2, 2026 19:41
Codex P1 round 13 (commit 4646b9f):

HandleHashField was rejecting zero-length field names as
ErrRedisInvalidHashKey, but Redis hash field names are
binary-safe and `HSET k "" v` is a valid command. The live
store emits a key shaped exactly `!hs|fld|<len><userKey>` with
no trailing field bytes for that case, so backup decoding would
error out on real data created via HSET with an empty field
name.

Remove the explicit zero-length check; the array-of-records
JSON shape introduced in round 1 already handles empty names
without collision (an entry like {"name":"", "value":"..."}
sorts and round-trips distinctly).

Caller audit per loop's audit-semantics rule (fail-closed →
fail-open behavior change for empty fieldName):
  - HandleHashField has only test-code callers; no production
    pipeline consumes it yet (per `grep -n "HandleHashField"
    internal/backup/`).
  - The truncated-key test (TestRedisDB_HashRejectsTruncatedFieldKey)
    drives a malformed binary key whose parseHashFieldKey returns
    ok=false; that branch still surfaces ErrRedisInvalidHashKey,
    independent of the empty-name check we removed.

Test: TestRedisDB_HashAcceptsEmptyFieldName drives one empty-
name field and one named field, asserts both survive distinct
records in the output JSON.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 2 fix for Codex P1 round 13 on commit 4646b9f (commit f44a4e7b):

HandleHashField was rejecting zero-length field names as ErrRedisInvalidHashKey, but Redis hash field names are binary-safe and HSET k "" v is a valid command. The live store emits exactly !hs|fld|<len><userKey> with no trailing field bytes for that case, so backup decoding would error out on real data.

Removed the explicit zero-length check; the array-of-records JSON shape introduced in round 1 already handles empty names without collision (an entry like {"name":"", "value":"..."} round-trips distinctly).

Caller audit per loop's audit-semantics rule (fail-closed → fail-open for empty fieldName):

  • HandleHashField has only test-code callers (no production pipeline consumes it yet).
  • The truncated-key test TestRedisDB_HashRejectsTruncatedFieldKey drives a malformed binary key where parseHashFieldKey returns ok=false; that branch still surfaces ErrRedisInvalidHashKey, independent of the empty-name check.

Test: TestRedisDB_HashAcceptsEmptyFieldName drives one empty-name field and one named field, asserts both survive distinctly in the output JSON. go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 1m 29s —— View job


Review — Round 2 fix (commit f44a4e7b)

  • Read changed files
  • Review Round 2 fix: empty hash field name acceptance
  • Verify test coverage for empty field name
  • Check for new issues introduced
  • Post final review

The fix is correct and minimal. Analysis below.


The change

HandleHashField previously held:

if len(fieldName) == 0 {
    return cockroachdberr.Wrapf(ErrRedisInvalidHashKey,
        "empty field name in key %q", key)
}

This is removed. The trigger is legitimate: Redis hash field names are binary-safe and HSET k "" v is a valid command that the live store encodes as !hs|fld|<4-byte ukLen><userKey> with no trailing bytes. The zero-length rejection would therefore cause real backups to fail on valid data.


Correctness walk-through

parseHashFieldKey with no trailing bytes:

fieldName := rest[hashUserKeyLenSize+len(userKey):]
return userKey, fieldName, true

When the key is exactly !hs|fld|<4><userKey> with nothing after the user key bytes, rest[4+len(userKey):] is an empty slice. parseUserKeyLenPrefix already verified that len(rest) >= 4 + ukLen, so no out-of-bounds — returns (userKey, []byte{}, true). ✅

Map key collision with empty field name:

st.fields[string(fieldName)] — Go's empty string "" is a valid and distinct map key. No path in the encoder writes "" as a field name except a genuine empty-field-name arrival, so there is no collision risk. ✅

Duplicate empty field names (two HSET k "" … operations):

st.fields[""] = bytes.Clone(value) overwrites on a second arrival — consistent with Redis semantics (last write wins for the same field). ✅

declaredLen / length-mismatch warning:

The test registers declaredLen=2 and inserts exactly 2 fields (empty-name + "named"). The mismatch warning is NOT triggered. Correct. ✅

Sort order:

sort.Strings sorts the raw byte sequences of field names. Empty string "" sorts before any non-empty string, so fields[0]["name"] being "" in the test is correct. ✅


Test coverage

TestRedisDB_HashAcceptsEmptyFieldName covers:

  • Meta + 2 fields (one empty-name, one named) → Finalize() succeeds
  • Array length is 2
  • fields[0] has name="", value="empty-name-value"
  • fields[1] has name="named"

TestRedisDB_HashRejectsTruncatedFieldKey remains intact: it drives a key where parseHashFieldKey returns ok=false (malformed ukLen that exceeds the available bytes), and that path still surfaces ErrRedisInvalidHashKey. The removal of the zero-length check does not affect this test. ✅


One minor nit (no action required)

internal/backup/redis_hash.go:72 — the new comment includes "Codex P1 round 13 (PR #725)" which is a task reference. CLAUDE.md says these belong in PR descriptions rather than code comments since they rot as the codebase evolves. The rest of the comment — explaining that empty field names are binary-safe in Redis and that the live store emits exactly the no-trailing-bytes key shape — is the non-obvious why that justifies keeping a comment at all. A future reader would benefit from it even without the PR reference. Worth noting if the project ever does a comment cleanup pass.


Verdict: The fix is minimal, correct, and the regression test is solid. Pre-existing open observations from Round 1 (delta-prefix dispatch ordering, TTL-before-meta snapshot ordering) remain unchanged and were acknowledged in the last round. Ready to merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6abf62f547

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// parseHashMetaKey strips !hs|meta| and the 4-byte BE userKeyLen prefix.
// Returns (userKey, true) on success.
func parseHashMetaKey(key []byte) ([]byte, bool) {
rest := bytes.TrimPrefix(key, []byte(RedisHashMetaPrefix))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude hash meta-delta keys from meta parsing

parseHashMetaKey trims RedisHashMetaPrefix, which also matches !hs|meta|d|... delta keys, so those records are treated as base-meta candidates and then fail parsing as malformed keys. Delta keys are a normal part of live Redis hash state (the codebase even documents that delta prefixes share the meta prefix in adapter/redis_retry.go), so when snapshot dispatch sends hash-prefixed records to this encoder, valid snapshots containing unresolved deltas can abort with ErrRedisInvalidHashKey instead of being skipped or handled.

Useful? React with 👍 / 👎.

Codex P1 round 14 (commit 6abf62f):

The delta prefix `!hs|meta|d|` shares the base meta prefix
`!hs|meta|` (the live store/hash_helpers.go documents the same
nesting). A snapshot dispatcher that routes records by "starts
with RedisHashMetaPrefix" lands delta records in HandleHashMeta,
where parseHashMetaKey trimmed the meta prefix and tried to
parse `d|<len>...` as `<len><userKey>` — failing with
ErrRedisInvalidHashKey on real snapshots that contain unresolved
deltas.

Phase 0a's hash output emits an array of OBSERVED fields
(`!hs|fld|...` records), so the delta arithmetic the live system
uses to recompute Len is not needed at backup time. Skip delta
keys silently in HandleHashMeta. Defense-in-depth:
parseHashMetaKey also returns `!ok` for delta-prefixed keys so a
future refactor that bypasses the HandleHashMeta short-circuit
still surfaces a parse failure rather than silent state
corruption.

Caller audit (per loop's audit-semantics rule for fail-closed →
fail-open behavior change):
  - HandleHashMeta has only test-code callers; no production
    pipeline consumes it yet.
  - parseHashMetaKey is called only from HandleHashMeta. The
    !ok return is mapped to ErrRedisInvalidHashKey in
    HandleHashMeta's existing path; the delta short-circuit
    runs first now so the parser path never sees a delta key
    in normal flow.

Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey synthesises a
realistic delta key and asserts both that HandleHashMeta returns
nil and that parseHashMetaKey returns !ok directly.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 3 fix for Codex P1 round 14 on commit 6abf62f (commit b757dadf):

The delta prefix !hs|meta|d| shares the base meta prefix !hs|meta| (the live store/hash_helpers.go documents the same nesting). A snapshot dispatcher that routes records by "starts with RedisHashMetaPrefix" landed delta records in HandleHashMeta, where parseHashMetaKey trimmed the meta prefix and tried to parse d|<len>... as <len><userKey> — failing with ErrRedisInvalidHashKey on real snapshots containing unresolved deltas.

Phase 0a's hash output emits an array of OBSERVED fields (!hs|fld| records), so the delta arithmetic the live system uses to recompute Len is not needed at backup time. HandleHashMeta now skips delta keys silently. Defense-in-depth: parseHashMetaKey also returns !ok for delta-prefixed keys so a future refactor that bypasses the HandleHashMeta short-circuit surfaces a parse failure rather than silent state corruption.

Caller audit per loop's rule (fail-closed → fail-open behavior):

  • HandleHashMeta has only test-code callers; no production pipeline yet.
  • parseHashMetaKey is called only from HandleHashMeta; the !ok return path still maps to ErrRedisInvalidHashKey for non-delta malformed keys.

Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey synthesises a realistic delta key and asserts both that HandleHashMeta returns nil and that parseHashMetaKey returns !ok directly.

go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 1m 40s —— View job


Review — Round 3 fix (commit b757dadf)

  • Read changed files
  • Review delta-key skip logic in HandleHashMeta
  • Review defense-in-depth parseHashMetaKey change
  • Check test coverage for TestRedisDB_HashHandleHashMetaSkipsDeltaKey
  • Check for new issues introduced
  • Post final review

The fix is correct and the defense-in-depth approach is better than what I flagged in Round 1. Analysis below.


The problem recapped

RedisHashMetaDeltaPrefix = "!hs|meta|d|" is a strict extension of RedisHashMetaPrefix = "!hs|meta|". A dispatcher routing by HasPrefix(key, RedisHashMetaPrefix) routes delta keys into HandleHashMeta. The old parseHashMetaKey stripped !hs|meta| and tried to parse d|<4-byte len>... as <4-byte len><userKey> — producing ErrRedisInvalidHashKey on valid snapshots containing unresolved deltas. Confirmed against store/hash_helpers.go: HashMetaDeltaPrefix = "!hs|meta|d|" (line 19) — the prefix strings in the backup package match the live store exactly.


HandleHashMeta skip (lines 63-65) ✅

if bytes.HasPrefix(key, []byte(RedisHashMetaDeltaPrefix)) {
    return nil
}

The skip happens before r.hashState(userKey), so no phantom hash state is created for a delta-only key. Phase 0a emits observed fields (!hs|fld| records), so the delta arithmetic that the live system uses to recompute Len is genuinely unneeded here — the design rationale is correct.

This resolves the "dispatcher evaluation order" concern I raised in Round 1 with a better approach: rather than requiring the dispatcher to evaluate the longer prefix first, the handler itself is now safe regardless of how the dispatcher is ordered. ✅


parseHashMetaKey defense-in-depth (lines 133-135) ✅

if bytes.HasPrefix(key, []byte(RedisHashMetaDeltaPrefix)) {
    return nil, false
}

The parser-level guard is correct defense-in-depth. A future caller that bypasses the HandleHashMeta short-circuit would get a clean parse failure (!okErrRedisInvalidHashKey) rather than silent state corruption from interpreting d| as a 4-byte length prefix. The comment accurately describes this invariant: "Callers that want delta-tolerant behavior (HandleHashMeta) should detect the delta prefix BEFORE calling this function." ✅

The ordering is safe: the delta check runs before bytes.TrimPrefix(key, []byte(RedisHashMetaPrefix)), so the "not a meta key at all" check (len(rest) == len(key)) is never evaluated for a delta key — which is fine since returning nil, false early is already the correct result.


Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey

The test synthesises a structurally realistic delta key:

!hs|meta|d| + [4-byte BE len=1] + "k" + [8-byte commitTS=1] + [4-byte seqInTxn=0]

This matches store.HashMetaDeltaKey("k", 1, 0) byte-for-byte (same prefix, same encoding shape). Both assertions are present:

  • HandleHashMeta returns nil → silent skip is exercised ✅
  • parseHashMetaKey returns !ok → parser-level guard is exercised ✅

One minor gap (no action required): the test doesn't call Finalize() to confirm the delta key doesn't accidentally create an entry in r.hashes. It doesn't need to — HandleHashMeta returns before r.hashState() is called, so no state is created. But if future reviewers see only this test in isolation, the coverage relationship to flushHashes is implicit. The existing round-trip tests cover Finalize; not a gap.


Outstanding observations (from prior rounds)

TTL-before-meta snapshot ordering (Round 1, pre-existing): No change here; still a pre-existing behaviour worth a follow-up test but not a blocker.

Task-reference comments (Round 2 nit, pre-existing): The HandleHashMeta and parseHashMetaKey comments both include "Codex P1 round 14 (PR #725 #13)" inline. Per CLAUDE.md these belong in commit/PR descriptions, not code. Not a blocker; worth a future cleanup pass.


Verdict: All three rounds of reviewer-identified defects are correctly fixed. The Round 3 approach (make the handler robust regardless of dispatcher order + defense-in-depth at the parser boundary) is cleaner than requiring dispatcher ordering and directly addresses what I flagged in Round 1. Ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant