backup: Redis zset encoder (Phase 0a) by bootjp · Pull Request #790 · bootjp/elastickv

bootjp · 2026-05-19T09:15:34Z

Summary

Adds the Redis sorted-set (zset) encoder for the Phase 0 logical
snapshot decoder (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md,
lines 334-335). Mirrors the hash/list/set encoders shipped in
#725/#755/#758. After this lands, Phase 0a's remaining Redis work
is the stream encoder + HLL TTL sidecar; the per-adapter encoders
themselves will be complete.

Output shape per the design:

{"format_version": 1,
 "members": [{"member": "alice", "score": 100}],
 "expire_at_ms": null}

Members sorted by raw byte order (not by score) so diff -r between
two dumps with the same logical contents stays line-stable across
score-only mutations. Scores emit as JSON numbers for finite values,
ZADD-conventional "+inf" / "-inf" strings for ±Inf (json.Marshal
rejects non-finite floats). NaN scores fail closed at intake — Redis's
ZADD rejects NaN at the wire level, so a NaN at backup time indicates
corruption.

Wire layout (mirrors store/zset_helpers.go):

!zs|meta|<userKeyLen(4)><userKey> → 8-byte BE Len
!zs|mem|<userKeyLen(4)><userKey><member> → 8-byte IEEE 754 score
!zs|scr|<userKeyLen(4)><userKey><sortableScore(8)><member> →
silently discarded (secondary index; !zs|mem| is the source of
truth)
!zs|meta|d|<userKeyLen(4)><userKey><commitTS(8)><seqInTxn(4)> →
silently skipped (delta family; same policy as hash/set encoders)

TTL routing: !redis|ttl|<userKey> for a registered zset folds into
the JSON expire_at_ms field, matching the set/list/hash inlining,
so a restorer replays ZADD + EXPIRE in one shot.

Self-review (5 lenses)

Data loss — !zs|mem| is the source of truth; !zs|scr| and
delta records are intentional silent skips with audit notes.
Empty zsets (Len=0 but meta seen) still emit a file because
TYPE k → zset is observable to clients.
Concurrency / distributed — RedisDB is sequential per scope
(matches the existing per-DB encoder contract); no shared state.
Performance — per-zset state in a map, flushed once at Finalize.
Bounded by maxWideColumnItems on the live side; sort is O(n log n)
on member-name bytes. Identical cost shape to hash/set encoders.
Data consistency — JSON field order pinned via struct tags
(not map). Inf score uses json.RawMessage so the same score
key emits as either a number or a string. NaN fails closed.
Test coverage — 17 table-driven tests under
internal/backup/redis_zset_test.go:
- round-trip basic / empty / TTL inlining
- binary member via base64 envelope
- delta-key skip (both HandleZSetMeta entry + parseZSetMetaKey guard)
- HandleZSetMetaDelta explicit entry
- HandleZSetScore silent-discard
- malformed meta length / overflow / MaxInt64 boundary
- malformed member-value length / NaN rejection
- ±Inf string-form serialization
- members-without-meta still emits file
- duplicate-members latest-wins (ZADD semantics)

Caller audit (per `/loop` standing instruction)

Semantics-changing edit: new case redisKindZSet: branch in
HandleTTL (redis_string.go:310). Purely additive — the new branch
fires only when zsetState() has previously registered the key.
No existing handler maps to redisKindZSet, so no prior call site
changes behavior. Verified:

grep -n 'redisKindZSet' internal/backup/
# internal/backup/redis_string.go:88
# internal/backup/redis_string.go:310
# internal/backup/redis_zset.go:166

Three references, all new in this PR. No legacy caller assumes the
prior behavior.

Test plan

go test -race ./internal/backup/ → ok
golangci-lint run ./internal/backup/... → 0 issues
go build ./... → ok
go vet ./internal/backup/... → ok

Summary by CodeRabbit

New Features
- Added Redis sorted-set (ZSET) backup and restore functionality.
- TTL values are now inlined within sorted-set backup records.
- Enhanced floating-point score handling, including proper serialization of infinity values.
Tests
- Comprehensive test coverage added for sorted-set backup operations.

Decodes !zs|meta|/!zs|mem|/!zs|scr|/!zs|meta|d| snapshot records into the per-zset `zsets/<key>.json` shape defined by the Phase 0 design (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md line 334-335). Mirrors the hash/list/set encoder shape: - !zs|meta| → 8-byte BE declared member count - !zs|mem|<m> → member m with 8-byte IEEE 754 score - !zs|scr| → secondary score index; silently discarded (!zs|mem| is source of truth at backup time) - !zs|meta|d| → delta records; silently skipped (set/hash policy) Output is sorted by raw member-name bytes so `diff -r` of two dumps with the same logical contents but mutated scores stays line-stable. Scores serialize as JSON numbers for finite values and the ZADD-conventional `"+inf"`/`"-inf"` strings for non-finite ones (json.Marshal rejects ±Inf so the score field uses json.RawMessage). NaN scores fail closed at HandleZSetMember: Redis's ZADD rejects NaN at the wire level, so a NaN in storage indicates corruption that we refuse to silently propagate into the dump. TTL routing: !redis|ttl|<userKey> for a registered zset key folds into the zset JSON's expire_at_ms (matching the set/list/hash inlining), so a restorer replays ZADD + EXPIRE in one shot without chasing a separate sidecar. Self-review: 1. Data loss — !zs|mem| is the source of truth; !zs|scr| and the delta family are intentional silent skips with caller-audit notes. 2. Concurrency — RedisDB is sequential per scope (matches the existing per-DB encoder contract); no shared state across DBs. 3. Performance — per-zset state buffered in a map; flushed at Finalize. Bounded by maxWideColumnItems on the live side. Sort is O(n log n) on member-name bytes; identical cost shape to the hash/set encoders that shipped in #725/#758. 4. Consistency — JSON field-order determinism preserved (struct tags, not map). Inf score uses string form via json.RawMessage so the same `score` key accepts both shapes. 5. Coverage — 17 table-driven tests: - round-trip basic / empty zset / TTL inlining - binary member via base64 envelope - delta-key skip (both HandleZSetMeta entry + parseZSetMetaKey guard) - HandleZSetMetaDelta explicit entry point - HandleZSetScore silent discard - malformed meta length / overflow / MaxInt64 boundary - malformed member-value length / NaN rejection - ±Inf string-form serialization - members-without-meta still emits file - duplicate-members latest-wins (ZADD semantics) Caller audit for semantics-changing edit (new `case redisKindZSet:` in HandleTTL, redis_string.go:310): purely additive. The new branch fires only when zsetState() has previously registered the key; no existing handler maps to redisKindZSet, so no prior call site changes behavior. Verified via `grep -n 'redisKindZSet' internal/backup/`.

coderabbitai · 2026-05-19T09:15:50Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d8ecb6a-d83b-415a-b789-5b96ef5e02a4

📥 Commits

Reviewing files that changed from the base of the PR and between 5e323cc and 8eca5cd.

📒 Files selected for processing (5)

internal/backup/redis_set.go
internal/backup/redis_string.go
internal/backup/redis_string_test.go
internal/backup/redis_zset.go
internal/backup/redis_zset_test.go

📝 Walkthrough

Walkthrough

This PR adds complete Redis sorted-set (ZSET) backup support by introducing ZSET snapshot record handlers, per-key state accumulation with TTL inlining, and deterministic JSON emission. The implementation extends the existing RedisDB infrastructure with a new redisKindZSet kind and integrates ZSET flushing into the finalize pipeline.

Changes

ZSET Backup Feature

Layer / File(s)	Summary
ZSET Integration into TTL Router `internal/backup/redis_string.go`	`RedisDB` gains a new `redisKindZSet` kind enum value and `zsets` state buffer map. `HandleTTL` routes ZSET TTL records into the per-key ZSET state (enabling TTL inlining into JSON output), and `Finalize()` calls `flushZSets` to emit JSON files alongside existing hash/list/set outputs.
ZSET Data Model and Parser Helpers `internal/backup/redis_zset.go` (lines 1–202)	Key prefix constants (`!zs
ZSET Snapshot Intake Handlers `internal/backup/redis_zset.go` (lines 74–168)	`HandleZSetMeta` parses declared member count with int64 overflow guards and skips delta records. `HandleZSetMember` decodes big-endian IEEE-754 scores, rejects NaN, and applies latest-wins semantics. `HandleZSetScore` and `HandleZSetMetaDelta` are no-ops that safely accept but discard those record types. Lazy state creation and `kindByKey` registration enable per-key ZSET routing in the TTL path.
ZSET JSON Emission and Marshaling `internal/backup/redis_zset.go` (lines 204–329)	`flushZSets` iterates accumulated ZSETs and warns on member-count mismatches. `writeZSetJSON` encodes user key paths, marshals state to JSON, and performs atomic writes. `marshalZSetJSON` produces deterministic output: collects and byte-sorts member names, marshals scores via `marshalRedisZSetScore` (which encodes ±Inf as JSON strings), conditionally includes `expire_at_ms`, and wraps with `format_version`.
Hash Marshaler Linting Annotation `internal/backup/redis_hash.go`	`marshalHashJSON` gains an expanded comment explaining its intentional duplication for JSON field-order determinism and a `//nolint:dupl` annotation to suppress linting noise.
Comprehensive Test Suite `internal/backup/redis_zset_test.go`	17 test functions validate ZSET round-trip encoding, TTL inlining (with sidecar absence), member-count mismatch warnings, binary member base64 enveloping, delta meta-key skipping, error handling for malformed/overflow/NaN values, ±Inf score JSON string encoding, parser-level validation, members-without-meta emission, duplicate member latest-wins semantics, and `math.MaxInt64` boundary acceptance. Test helpers encode snapshot keys and scores using big-endian formats matching intake expectations.

Sequence Diagram

sequenceDiagram
  participant Snapshot as Snapshot Data
  participant Intake as Intake Handlers
  participant State as per-ZSET State
  participant TTL as TTL Router
  participant Emit as JSON Emission
  participant Output as zsets/key.json
  
  Snapshot->>Intake: !zs|meta|... (member count)
  Intake->>State: Store declaredLen, metaSeen
  
  Snapshot->>Intake: !zs|mem|... (member name + score)
  Intake->>State: Decode IEEE-754 score, store member
  
  Snapshot->>TTL: !redis|ttl|... (for ZSET key)
  TTL->>State: Inlined into hasTTL + expireAtMs
  
  Snapshot->>Intake: !zs|score|... or !zs|meta|d|...
  Intake->>Intake: Silently discard (no-op)
  
  Snapshot->>Emit: Finalize() called
  Emit->>State: Collect all ZSET states
  State->>Emit: Check declaredLen vs observedLen
  Emit->>Emit: Warn if mismatch
  Emit->>Emit: Sort members by binary name
  Emit->>Emit: Encode scores (+Inf/-Inf as strings)
  Emit->>Emit: Include expire_at_ms if TTL set
  Emit->>Output: Atomic write with format_version

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

bootjp/elastickv#713: Adds string/HLL/TTL support to the same simple-type RedisDB router that ZSET extends in this PR, with overlapping TTL routing and Finalize() integration patterns.
bootjp/elastickv#755: Adds list backup support using the same RedisDB buffering architecture, HandleTTL inlining, and Finalize() flush pattern that ZSET reuses for consistent JSON output.

Poem

🐰 A hop through sorted sets so fine,
With members floating in a line,
IEEE-754 scores encoded true,
And TTL tucked in JSON's view—
One more backup type shines through!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'backup: Redis zset encoder (Phase 0a)' directly and clearly describes the main change: adding a Redis sorted-set (zset) encoder for the Phase 0 backup system.
Docstring Coverage	✅ Passed	Docstring coverage is 91.18% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch backup/redis-zset-encoder

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-05-19T09:15:51Z

@claude review

claude · 2026-05-19T09:16:09Z

Claude finished @bootjp's task in 3m 31s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a)

Tasks

Gather context
Read internal/backup/redis_zset.go
Read internal/backup/redis_zset_test.go
Read internal/backup/redis_string.go (modified)
Read internal/backup/redis_hash.go (modified)
Analyze correctness, edge cases, conventions
Post review feedback

Wire format

The encoder constants and decoding arithmetic exactly mirror store/zset_helpers.go:

!zs|meta| / !zs|mem| / !zs|scr| / !zs|meta|d| prefixes match the live constants.
redisZSetScoreSize = 8 is the correct mirror of zsetMetaSizeBytes.
math.Float64frombits(binary.BigEndian.Uint64(value)) correctly inverts MarshalZSetScore (IEEE 754 big-endian).
The uint64 > math.MaxInt64 overflow guard in HandleZSetMeta (line 104) correctly mirrors the pattern in the hash/list/set encoders.

Issues

Medium — TTL scan-order gap not tested (functional risk)

!redis|ttl| keys sort lexicographically before !zs|... keys (r=0x72 < z=0x7A). In a single sorted Pebble scan, TTL records for zset keys arrive before any !zs|meta| or !zs|mem| record registers the key in kindByKey. That means HandleTTL hits the redisKindUnknown branch and increments orphanTTLCount — the TTL is not inlined, and the JSON expire_at_ms stays null.

The same ordering issue exists for the set encoder (s=0x73 > r=0x72), so this is a pre-existing design decision rather than a new regression. But:

TestRedisDB_ZSetTTLInlinedFromScanIndex only exercises the cooperative order (meta → member → TTL). There is no test for the inverse (TTL → meta → member).
If the real dispatcher does a single prefix-sorted scan, TTL inlining for zsets (and sets) will silently fall through to orphanTTLCount on every expiring key.

It would be valuable to add a test that documents the current (Phase 0a) behavior under the real scan order, even if that behavior is "TTL becomes an orphan and is reported but not inlined." Otherwise the existing test creates a false sense that TTL inlining works end-to-end.

If the dispatcher is known to do per-prefix scans in a controlled order, a brief note in the encoder's top-of-file comment would be enough.

Fix/document this →

Low — `marshalRedisZSetScore`: no unreachable guard for NaN

redis_zset.go:254-256 documents that NaN is unreachable because HandleZSetMember rejects it at intake. However, marshalRedisZSetScore has no explicit guard. If NaN somehow enters st.members (e.g., via a future refactor that bypasses the handler), json.Marshal(nan) returns a standard-library error — which surfaces correctly. This is not a silent corruption bug, but it surfaces as an opaque json: unsupported value error rather than the domain-specific ErrRedisInvalidZSetMember. A small defensive check would make the failure mode obvious:

if math.IsNaN(score) {
    // unreachable: HandleZSetMember rejects NaN at intake
    return nil, cockroachdberr.New("backup: internal error: NaN score reached marshalling (should have been rejected at intake)")
}

Confirmed correct

sort.Strings byte order — sort.Strings compares by byte content in Go; the "raw byte order" claim is accurate.
//nolint:dupl cross-reference — redis_hash.go:244 now points to redis_zset.go:marshalZSetJSON. The rationale (can't collapse parallel wrappers without breaking JSON field-order determinism) is sound.
Empty zset emit — Len=0 with metaSeen=true still writes a file; TYPE key → zset is observable to clients. Correct.
Members-without-meta — metaSeen=false guard in flushZSets:212 prevents a spurious length-mismatch warning. Test covers this.
HandleZSetScore / HandleZSetMetaDelta as named no-ops — The discards are explicit functions rather than routing to an orphan sink, so a future grep -n 'every !zs| prefix has a handler' audit finds them. The design rationale is documented.
Delta double-guard — parseZSetMetaKey independently rejects delta-prefixed keys (line 177), so a dispatcher that bypasses HandleZSetMeta's entry guard still surfaces a parse error rather than corrupt state. Well-motivated.
HandleTTL case redisKindZSet — Additive-only change. r.zsetState(userKey) inside the case is a no-op re-registration of the already-registered key; correct and idempotent.
Test coverage — 17 cases covering round-trip, empty zset, TTL inlining, binary member base64 envelope, delta-key skip (both entry and parser levels), score-index discard, malformed meta length, overflow guard, MaxInt64 boundary, malformed member value, NaN rejection, ±Inf string form, members-without-meta, duplicate-member latest-wins.

Summary

Solid implementation that faithfully mirrors the hash/list/set encoders and handles all the wire-format corner cases well. The one functional concern is the TTL scan-order edge case (which the existing set encoder shares); whether that's a problem in practice depends on the dispatcher's scan strategy. Adding a test that documents the behavior under inverted order would close the coverage gap without requiring a design change.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e323cc539

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T09:18:49Z

+	case redisKindZSet:
+		// Same per-record TTL inlining: ZADD + EXPIRE replay in
+		// one shot from the per-zset JSON, no separate sidecar.
+		st := r.zsetState(userKey)
+		st.expireAtMs = expireAtMs
+		st.hasTTL = true


Buffer zset TTLs that arrive before zset rows

For expiring sorted sets in real snapshot order, this redisKindZSet branch is not reached: Pebble snapshots are emitted by iterator order (store/snapshot_pebble.go uses iter.First(); iter.Next()), and the TTL key prefix !redis|ttl| sorts before all zset prefixes !zs|.... That means HandleTTL sees redisKindUnknown, increments orphanTTLCount, and discards the expiry before zsetState is created, so every restored zset with a TTL becomes persistent. The zset TTL needs to be buffered/rerouted when the zset row is observed rather than relying on prior type observation.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces support for Redis Sorted Sets (ZSets) in the backup system, adding a new encoder that translates raw snapshot records into versioned JSON files. The implementation handles member-score mappings, TTL inlining, and ensures deterministic output by sorting members by name. Review feedback suggests refactoring the marshaling logic for ZSets and Hashes into a shared helper to reduce code duplication, although the developer previously noted that duplication was intentional to maintain JSON field order.

gemini-code-assist · 2026-05-19T09:22:12Z

+// Reviewers comparing the two functions should diff
+// (hashFieldRecord, "fields") against (zsetMemberRecord, "members") —
+// the only intentional divergence.
+func marshalZSetJSON(st *redisZSetState) ([]byte, error) { //nolint:dupl // see comment above


While deterministic field order is required for stable backups, repository guidelines (Rule 9) require centralizing common logic to avoid duplication. Instead of using nolint:dupl to suppress the warning, consider refactoring marshalZSetJSON and marshalHashJSON to use a shared helper function that preserves the necessary field order.

References

Centralize common complex logic, such as retry/backoff mechanisms, into shared helper functions to avoid code duplication and improve maintainability.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

internal/backup/redis_zset_test.go (1)
16-19: ⚡ Quick win

Remove avoidable //nolint:gosec suppressions in test key/value builders.

Line 18 and Lines 35, 45, 58, and 70 can avoid suppression with explicit bounds-safe conversions. Keep //nolint only where truly unavoidable.
Proposed refactor
 func encodeZSetMetaValue(memberCount int64) []byte {
+	if memberCount < 0 {
+		panic("test bug: negative memberCount")
+	}
 	v := make([]byte, 8)
-	binary.BigEndian.PutUint64(v, uint64(memberCount)) //nolint:gosec
+	binary.BigEndian.PutUint64(v, uint64(memberCount))
 	return v
 }
+
+func putUserKeyLen(l *[4]byte, userKey string) {
+	if len(userKey) > math.MaxUint32 {
+		panic("test bug: userKey too large")
+	}
+	binary.BigEndian.PutUint32(l[:], uint32(len(userKey)))
+}
 func zsetMetaKey(userKey string) []byte {
 	out := []byte(RedisZSetMetaPrefix)
 	var l [4]byte
-	binary.BigEndian.PutUint32(l[:], uint32(len(userKey))) //nolint:gosec
+	putUserKeyLen(&l, userKey)
 	out = append(out, l[:]...)
 	return append(out, userKey...)
 }
(Apply the same putUserKeyLen call pattern in zsetMemberKey, zsetScoreKey, and zsetMetaDeltaKey.)
As per coding guidelines, Use gofmt formatting and pass linters configured in .golangci.yaml ... Avoid //nolint annotations — refactor instead.

Also applies to: 35-35, 45-45, 58-58, 70-70
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/redis_zset_test.go` around lines 16 - 19, The test helpers
use //nolint:gosec to silence integer-to-uint64 conversions; replace those
suppressions by doing explicit, bounds-safe casts or using the existing
putUserKeyLen pattern so the conversion is obviously safe: in
encodeZSetMetaValue replace the nolint line with binary.BigEndian.PutUint64(v,
uint64(memberCount)) using an explicit checked/constrained conversion (e.g.,
ensure memberCount >= 0 then cast) or mirror putUserKeyLen’s approach; apply the
same refactor to zsetMemberKey, zsetScoreKey, and zsetMetaDeltaKey so no
//nolint is required and the conversions are clearly safe.
internal/backup/redis_zset.go (1)
87-93: ⚡ Quick win

Hoist []byte prefix conversions to package-level variables to avoid per-record allocations in snapshot processing.

Lines 87, 177, 180, and 192 repeatedly convert string constants to bytes at call sites during hot-path snapshot scanning. Moving these to package-level []byte variables eliminates four allocations per record processed.
♻️ Suggested refactor
 const (
 	RedisZSetMetaPrefix      = "!zs|meta|"
 	RedisZSetMemberPrefix    = "!zs|mem|"
 	RedisZSetScorePrefix     = "!zs|scr|"
 	RedisZSetMetaDeltaPrefix = "!zs|meta|d|"

 	redisZSetScoreSize = 8
 )
+
+var (
+	redisZSetMetaPrefixBytes      = []byte(RedisZSetMetaPrefix)
+	redisZSetMemberPrefixBytes    = []byte(RedisZSetMemberPrefix)
+	redisZSetMetaDeltaPrefixBytes = []byte(RedisZSetMetaDeltaPrefix)
+)
Then replace:

Line 87: []byte(RedisZSetMetaDeltaPrefix) → redisZSetMetaDeltaPrefixBytes

Line 177: []byte(RedisZSetMetaDeltaPrefix) → redisZSetMetaDeltaPrefixBytes

Line 180: []byte(RedisZSetMetaPrefix) → redisZSetMetaPrefixBytes

Line 192: []byte(RedisZSetMemberPrefix) → redisZSetMemberPrefixBytes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/redis_zset.go` around lines 87 - 93, Hot-path code repeatedly
converts string prefix constants to []byte per record; define package-level
byte-slice variables (e.g., redisZSetMetaDeltaPrefixBytes,
redisZSetMetaPrefixBytes, redisZSetMemberPrefixBytes) initialized from the
existing string constants (RedisZSetMetaDeltaPrefix, RedisZSetMetaPrefix,
RedisZSetMemberPrefix) and replace all call-site conversions such as
bytes.HasPrefix(key, []byte(RedisZSetMetaDeltaPrefix)) and other occurrences
(`[]byte(RedisZSetMetaPrefix)`, `[]byte(RedisZSetMemberPrefix)`) with the new
package-level variables; keep the semantic checks (e.g., parseZSetMetaKey usage)
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/backup/redis_zset_test.go`:
- Around line 310-316: The test currently only checks envelope["base64"] is
non-empty; instead decode the base64 string from envelope["base64"] (the
variable envelope in redis_zset_test.go) back to bytes and assert the decoded
bytes equal the original payload expected in the test (compare against the known
original member payload used earlier in the test), failing the test if decoding
errors or the bytes differ to guarantee true round-trip behavior.

---

Nitpick comments:
In `@internal/backup/redis_zset_test.go`:
- Around line 16-19: The test helpers use //nolint:gosec to silence
integer-to-uint64 conversions; replace those suppressions by doing explicit,
bounds-safe casts or using the existing putUserKeyLen pattern so the conversion
is obviously safe: in encodeZSetMetaValue replace the nolint line with
binary.BigEndian.PutUint64(v, uint64(memberCount)) using an explicit
checked/constrained conversion (e.g., ensure memberCount >= 0 then cast) or
mirror putUserKeyLen’s approach; apply the same refactor to zsetMemberKey,
zsetScoreKey, and zsetMetaDeltaKey so no //nolint is required and the
conversions are clearly safe.

In `@internal/backup/redis_zset.go`:
- Around line 87-93: Hot-path code repeatedly converts string prefix constants
to []byte per record; define package-level byte-slice variables (e.g.,
redisZSetMetaDeltaPrefixBytes, redisZSetMetaPrefixBytes,
redisZSetMemberPrefixBytes) initialized from the existing string constants
(RedisZSetMetaDeltaPrefix, RedisZSetMetaPrefix, RedisZSetMemberPrefix) and
replace all call-site conversions such as bytes.HasPrefix(key,
[]byte(RedisZSetMetaDeltaPrefix)) and other occurrences
(`[]byte(RedisZSetMetaPrefix)`, `[]byte(RedisZSetMemberPrefix)`) with the new
package-level variables; keep the semantic checks (e.g., parseZSetMetaKey usage)
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a09d53a0-f475-408c-b3ba-4d2181e38e17

📥 Commits

Reviewing files that changed from the base of the PR and between 8c0d07f and 5e323cc.

📒 Files selected for processing (4)

internal/backup/redis_hash.go
internal/backup/redis_string.go
internal/backup/redis_zset.go
internal/backup/redis_zset_test.go

coderabbitai · 2026-05-19T09:22:28Z

+	envelope, ok := rec["member"].(map[string]any)
+	if !ok {
+		t.Fatalf("expected base64 envelope on member, got %T(%v)", rec["member"], rec["member"])
+	}
+	if envelope["base64"] == "" {
+		t.Fatalf("base64 envelope missing payload: %v", envelope)
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert base64 payload correctness, not only presence.

Line 314-Line 316 only verifies non-empty payload, so a wrong payload could still pass. Decode and compare bytes to pin true round-trip behavior.

Proposed test hardening

import ( + "bytes" + "encoding/base64" "encoding/binary" "encoding/json" "math" @@ envelope, ok := rec["member"].(map[string]any) if !ok { t.Fatalf("expected base64 envelope on member, got %T(%v)", rec["member"], rec["member"]) } - if envelope["base64"] == "" { - t.Fatalf("base64 envelope missing payload: %v", envelope) - } + payload, ok := envelope["base64"].(string) + if !ok || payload == "" { + t.Fatalf("base64 envelope missing payload: %v", envelope) + } + decoded, err := base64.StdEncoding.DecodeString(payload) + if err != nil { + t.Fatalf("invalid base64 payload %q: %v", payload, err) + } + if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) { + t.Fatalf("decoded member = %v, want %v", decoded, []byte{0x80, 0xff, 0x01}) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

envelope, ok := rec["member"].(map[string]any)

if !ok {

t.Fatalf("expected base64 envelope on member, got %T(%v)", rec["member"], rec["member"])

}

if envelope["base64"] == "" {

t.Fatalf("base64 envelope missing payload: %v", envelope)

}

envelope, ok := rec["member"].(map[string]any)

if !ok {

t.Fatalf("expected base64 envelope on member, got %T(%v)", rec["member"], rec["member"])

}

payload, ok := envelope["base64"].(string)

if !ok || payload == "" {

t.Fatalf("base64 envelope missing payload: %v", envelope)

}

decoded, err := base64.StdEncoding.DecodeString(payload)

if err != nil {

t.Fatalf("invalid base64 payload %q: %v", payload, err)

}

if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {

t.Fatalf("decoded member = %v, want %v", decoded, []byte{0x80, 0xff, 0x01})

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/backup/redis_zset_test.go` around lines 310 - 316, The test currently only checks envelope["base64"] is non-empty; instead decode the base64 string from envelope["base64"] (the variable envelope in redis_zset_test.go) back to bytes and assert the decoded bytes equal the original payload expected in the test (compare against the known original member payload used earlier in the test), failing the test if decoding errors or the bytes differ to guarantee true round-trip behavior.

…d records Pebble snapshots are emitted in encoded-key order (store/snapshot_pebble.go::iter.First()+Next()), and `!redis|ttl|` lex-sorts BEFORE every wide-column prefix where the type letter is >= 's' (`!st|` set, `!stream|`, `!zs|` zset — because `r` < `s`/`z`). The original HandleTTL routed any unknown-kind expiry straight into orphanTTLCount, so for sets, streams, and zsets the TTL was DROPPED before zsetState/setState/streamState could claim it. Restored sets/zsets/streams with TTL became permanent. Codex P1 finding on PR #790. The same bug exists in the already- merged set encoder (PR #758); this commit fixes both retroactively. Stream encoder (PR #791) inherits the fix once rebased. Fix: HandleTTL parks unknown-kind expiries in a new pendingTTL map. Each wide-column state-init that may face the bad ordering (setState, zsetState — and streamState once PR #791 lands) drains the entry on first user-key registration via claimPendingTTL(). Finalize counts whatever remains in pendingTTL as truly-unmatched orphans (a TTL whose user key never appeared in any typed record — indicates store corruption or an unknown type prefix). Semantic-change caller audit (per /loop standing instruction): - HandleTTL's redisKindUnknown branch: previously incremented orphanTTLCount immediately; now buffers and lets Finalize count. All callers: only the per-record dispatcher in cmd/elastickv-snapshot-decode (not yet built — Phase 0a follow- up). No external caller mutates orphanTTLCount today. - TestRedisDB_OrphanTTLCountedNotBuffered: updated to assert intake-time orphanTTLCount==0 + pendingTTL grows, then post- Finalize orphanTTLCount==n. - New caller claimPendingTTL: called only by zsetState and setState in this PR. hashState/listState don't call it because their type prefixes (`!hs|`/`!lst|`) lex-sort BEFORE `!redis|ttl|` so the typed record arrives first; pendingTTL is always empty for them. Verified via `grep -n 'claimPendingTTL' internal/backup/`. New tests: - TestRedisDB_ZSetTTLArrivesBeforeRows — pins the fix for zsets. - TestRedisDB_SetTTLArrivesBeforeRows — retroactive coverage for PR #758's set encoder. - TestRedisDB_OrphanTTLCountsTrulyUnmatchedKeys — pins the new Finalize-time orphan semantics. Self-review: 1. Data loss — the original code DROPPED real TTL'd sets/zsets/streams on every backup. This fix recovers them. No new data-loss surface introduced. 2. Concurrency — pendingTTL is added to RedisDB which is already sequential-per-scope; no new locking required. 3. Performance — pendingTTL holds (string-userKey, uint64-expireAt) pairs. The string allocation cost matches kindByKey's, which we already pay for every typed record. The original P2 round 6 concern (don't buffer arbitrarily-large value payloads) is preserved: we still only buffer 8-byte expiry, not value bytes. 4. Consistency — drain happens at FIRST state registration so a later HandleTTL re-arrival (which would route through the redisKindSet/ZSet case in HandleTTL) is a no-op (st.expireAtMs gets overwritten with the same value). 5. Coverage — 3 new tests + 1 updated test. All 64 redis tests pass.

bootjp · 2026-05-19T09:41:05Z

@claude review

claude · 2026-05-19T09:41:21Z

Claude finished @bootjp's task in 2m 38s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a)

Tasks

Gather context from PR description and prior reviews
Read internal/backup/redis_zset.go
Read internal/backup/redis_zset_test.go
Read internal/backup/redis_string.go (modified)
Read internal/backup/redis_set.go (modified)
Cross-check against store/zset_helpers.go
Analyze and post review

The PR landed in two commits. The second commit (63e54d9) directly addresses the Codex P1 TTL scan-order finding that the previous Claude review and Codex both flagged, so I'm reviewing the whole PR as-shipped (including both commits).

P1 Fix — `pendingTTL` buffering (commit 2)

The fix is architecturally correct and well-tested. The essential flow:

HandleTTL parks unknown-kind expiries into pendingTTL (a new map[string]uint64 on RedisDB)
zsetState() and setState() call claimPendingTTL() on first registration of a user key, draining the parked expiry into the zset/set's expireAtMs
Finalize() counts len(r.pendingTTL) as orphans after all flush steps, so only keys that truly never appeared as typed records are warned

Key correctness properties verified:

No double-set: when TTL arrives after the key is registered, HandleTTL hits case redisKindZSet: → r.zsetState(userKey) fast-paths via if st, ok := r.zsets[uk]; ok { return st } — claimPendingTTL is never re-called. Safe.
No double-drain: claimPendingTTL does delete(r.pendingTTL, uk) atomically. A second call for the same key returns (0, false).
Finalize ordering: flushSets and flushZSets both run before orphanTTLCount is incremented, so no typed record can claim a TTL after the count fires.
hashState/listState correctness: Those state-init functions don't call claimPendingTTL because !hs|/!lst| lex-sort before !redis|ttl| — the typed record always arrives first. pendingTTL is always empty for those keys when their state-init fires. ✓

New tests TestRedisDB_ZSetTTLArrivesBeforeRows, TestRedisDB_SetTTLArrivesBeforeRows, and TestRedisDB_OrphanTTLCountsTrulyUnmatchedKeys all pin this behavior correctly.

Wire format and codec

All field sizes and decoding arithmetic match store/zset_helpers.go:

MarshalZSetScore → math.Float64bits(score) / binary.BigEndian.PutUint64 matches math.Float64frombits(binary.BigEndian.Uint64(value)) in HandleZSetMember:133. ✓
ZSetMetaKey layout matches parseZSetMetaKey. ✓
Delta key !zs|meta|d| double-guard (entry check + parseZSetMetaKey rejection) is the correct defense-in-depth pattern from the hash encoder. ✓
uint64 > math.MaxInt64 overflow guard mirrors hash/list/set encoders exactly. ✓

Confirmed correct — design decisions

Sort by member-name bytes, not score — sort.Strings in Go sorts by byte value of the string content; the claim in the docstring is accurate.
//nolint:dupl — The marshalZSetJSON / marshalHashJSON structural parallel cannot be collapsed without either losing JSON field-order determinism (map) or needing per-type generics (distinct JSON tags). The //nolint is the right call here.
NaN fail-closed at intake — HandleZSetMember:134 returns ErrRedisInvalidZSetMember for NaN; this prevents silent re-corruption and surfaces at the source record. ✓
Empty-zset emit — metaSeen=true with Len=0 still writes zsets/<key>.json; TYPE k → zset is observable. ✓
Members-without-meta — flushZSets:223 checks st.metaSeen before firing the length-mismatch warning, so a members-only run doesn't emit a spurious warning. ✓
HandleZSetScore / HandleZSetMetaDelta — Named no-op functions rather than routing to the orphan sink. Future audits via grep 'every !zs| prefix has a handler' will find them. ✓

Issues

Low — Base64 test assertion is shallow (`redis_zset_test.go:314-316`)

TestRedisDB_ZSetBinaryMemberUsesBase64Envelope only verifies the envelope is non-empty:

if envelope["base64"] == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}

This passes even if the encoded bytes are wrong. The CodeRabbit suggestion is sound — decode and compare against the expected bytes {0x80, 0xff, 0x01}:

payload, _ := envelope["base64"].(string)
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil { t.Fatalf(...) }
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Fix this →

Nit — `//nolint:gosec` in test key-builder helpers

Per CLAUDE.md convention (Avoid //nolint — refactor instead), the five //nolint:gosec suppressions in encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, and zsetMetaDeltaKey could be removed by adding an explicit bounds panic (e.g., if memberCount < 0 { panic("test bug") } before the conversion). Given these are test helpers with fully-controlled inputs this is very low impact, but worth tracking to stay consistent with the project convention.

Nit — Per-record `[]byte(prefix)` allocations (CodeRabbit)

bytes.HasPrefix(key, []byte(RedisZSetMetaDeltaPrefix)) at lines 87, 177 and similar at 180, 192 create a []byte from a string constant on every call. During a real snapshot scan these fire per record. Hoisting to package-level var redisZSetMetaDeltaPrefixBytes = []byte(RedisZSetMetaDeltaPrefix) (and analogues) eliminates 4 allocations per dispatched record. In the same vein the set and hash encoders have the same pattern — fixing zset here would be consistent.

Future work (not blocking)

Stream encoder — pendingTTL's doc comment already notes that streamState must call claimPendingTTL once PR backup: Redis stream encoder (Phase 0a) #791 lands. The infrastructure is in place.
store.UnmarshalZSetMeta missing overflow guard — The live decoder does int64(binary.BigEndian.Uint64(b)) without the > math.MaxInt64 check. The backup encoder is more defensive here. Worth a separate follow-up to align (or document that the live side trusts the write path to never produce > MaxInt64).

Summary

Solid implementation. The Codex P1 scan-order bug was correctly identified and fixed in commit 2 with proper pendingTTL infrastructure and three new tests. The zset encoder itself faithfully mirrors the hash/list/set encoders across wire format, delta handling, NaN rejection, ±Inf serialization, binary member safety, and empty-key emit rules. The only functional gap remaining is the weak base64 assertion in the binary-member test; the rest are style nits.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63e54d91a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T09:45:19Z

+const (
+	RedisZSetMetaPrefix      = "!zs|meta|"
+	RedisZSetMemberPrefix    = "!zs|mem|"
+	RedisZSetScorePrefix     = "!zs|scr|"
+	RedisZSetMetaDeltaPrefix = "!zs|meta|d|"


Include legacy zset blobs in dumps

For zsets stored only as the legacy single-blob key, this new encoder never claims the record because it only recognizes the wide-column !zs|... prefixes. That format is still part of the live data model: adapter/redis_compat_types.go:82 defines !redis|zset|, adapter/redis_compat_commands.go:3495-3508 still writes that key for non-empty persisted zset updates, and adapter/redis_compat_helpers.go:610-631 reads it as the fallback when no wide-column members exist. A snapshot containing such a zset will omit it from zsets/ entirely (and any TTL remains orphaned), so backups can lose user-visible sorted sets; add a handler for !redis|zset|<key> or migrate those blobs during encoding.

Useful? React with 👍 / 👎.

…ield preservation Two P1 findings from chatgpt-codex on PR #791: P1a: Buffer stream TTLs that arrive before stream rows Pebble snapshots emit records in encoded-key order (store/snapshot_pebble.go::iter.First()+Next()), and `!redis|ttl|` lex-sorts BEFORE `!stream|...` because `r` < `s`. In real snapshot order the TTL arrives FIRST, kindByKey is still redisKindUnknown when HandleTTL fires, and the original code counted the TTL as an orphan and dropped it — every TTL'd stream restored as permanent. Same root cause as the set encoder's latent bug in PR #758. This commit adds a pendingTTL infrastructure (matching the parallel fix on PR #790) so the expiry parks during the redisKindUnknown window and drains when streamState first registers the user key. The set encoder gets the same retroactive drain. P1b: Preserve duplicate stream fields instead of map-collapsing XADD permits duplicate field names within one entry (e.g. `XADD s * f v1 f v2`). The protobuf entry stores the interleaved slice verbatim, but my marshalStreamJSONL collapsed pairs into `map[string]string`, silently dropping every duplicate. A restore of such an entry would lose the second (and later) pair. Fix: emit `fields` as a JSON ARRAY of `{name, value}` records (streamFieldJSON). Order is the protobuf's interleaved order so a restore can replay the original XADD argv exactly. The design example at docs/design/2026_04_29_proposed_snapshot_logical_decoder.md:338 showed object form. That representation was unsafe for streams (though fine for hashes where the wire-level encoder normalises field names earlier). The format is owned by Phase 0 — adjusted in this PR before the format ships any consumers. Caller audit (per /loop standing instruction): - HandleTTL's redisKindUnknown branch: same semantic change as PR #790's r1 — previously incremented orphanTTLCount on intake; now buffers in pendingTTL and lets Finalize count at end. Same audit conclusion: no external callers of orphanTTLCount; TestRedisDB_OrphanTTLCountedNotBuffered updated to assert the new intake/Finalize split. - streamEntryJSON.Fields type change `map → slice`: only marshalled by encoding/json; the only reader is the test suite, which is updated in this commit. No on-disk format compatibility concerns because Phase 0 has not shipped a consumer yet. - New caller claimPendingTTL: called by setState (retroactive) and streamState (new) in this PR. hashState/listState don't call it because their type prefixes lex-sort BEFORE `!redis|ttl|`. Verified via `grep -n 'claimPendingTTL' internal/backup/`. New tests: - TestRedisDB_StreamDuplicateFieldsPreserved — pins P1b fix. - TestRedisDB_StreamTTLArrivesBeforeRows — pins P1a fix for streams. - TestRedisDB_SetTTLArrivesBeforeRows — retroactive coverage for PR #758's set encoder (same root cause as the stream bug). - TestRedisDB_StreamFieldsDecodedToArray (renamed from ToObject) — updated to match the array shape. Self-review: 1. Data loss — the original code DROPPED real TTL'd streams on every backup AND dropped duplicate-field entries' later pairs. This fix recovers both. No new data-loss surface introduced. 2. Concurrency — pendingTTL added to RedisDB which is already sequential-per-scope; no new locking required. 3. Performance — pendingTTL holds (string-userKey, uint64-expireAt) pairs; same allocation shape as kindByKey. Fields slice replaces a map of the same logical size — slightly cheaper actually (no hash overhead). 4. Consistency — drain happens at FIRST state registration. The array form preserves insertion order from the protobuf so the restored XADD argv matches. 5. Coverage — 4 new tests + 2 updated. All 78 redis tests pass.

…lob layout Codex flagged that the wide-column zset encoder skips the legacy consolidated single-key blob layout the live store still writes. A zset stored only as `!redis|zset|<userKey>` (with the magic- prefixed pb.RedisZSetValue body) is silently dropped from backup output and any inline TTL becomes an orphan — user-visible sorted-set data loss. Live-side references (adapter, not changed by this commit): - adapter/redis_compat_types.go:82 — redisZSetPrefix - adapter/redis_compat_commands.go:3495-3508 — writes the blob for non-empty persisted zset updates - adapter/redis_compat_helpers.go:610-631 — reads it as the fallback when no wide-column members exist Fix: new public RedisDB.HandleZSetLegacyBlob method that decodes the magic-prefixed pb.RedisZSetValue and registers the same per- member state HandleZSetMember would. The wide-column merge case (mid-migration snapshot containing BOTH layouts for the same user key) works because `!redis|zset|` sorts BEFORE `!zs|...` so the blob arrives first and wide-column rows then update / add members via the latest-wins map. Inline TTL: `!redis|zset|<k>` sorts BEFORE `!redis|ttl|<k>`, so HandleTTL after this handler sees redisKindZSet already and folds via the case-redisKindZSet branch. No pendingTTL detour needed for this ordering. Fail-closed contract (matches existing wide-column path): - Missing magic prefix → ErrRedisInvalidZSetLegacyBlob - Unmarshal error → ErrRedisInvalidZSetLegacyBlob - NaN score → ErrRedisInvalidZSetLegacyBlob (Redis ZADD rejects NaN at wire level) Caller audit (per /loop standing instruction): new public method HandleZSetLegacyBlob has no external callers. Verified via 'grep -rn HandleZSetLegacyBlob --include=*.go' — all matches inside the test file in this PR. The cmd/elastickv-snapshot-decode dispatcher (Phase 0a follow-up, not yet built) will route the `!redis|zset|` prefix to this handler. Parallel bug class: the SAME issue exists for `!redis|hash|`, `!redis|set|`, and `!redis|stream|` legacy blob prefixes. Those encoders shipped in earlier PRs (#725, #758, #791). Each needs its own legacy-blob handler in a follow-up PR; this commit fixes only the zset case codex flagged on PR #790. New tests: - TestRedisDB_ZSetLegacyBlobRoundTrip — basic round-trip - TestRedisDB_ZSetLegacyBlobThenWideColumnMerges — mid-migration - TestRedisDB_ZSetLegacyBlobWithInlineTTL — TTL ordering - TestRedisDB_ZSetLegacyBlobRejectsMissingMagic — fail-closed - TestRedisDB_ZSetLegacyBlobRejectsNaNScore — fail-closed - TestRedisDB_ZSetLegacyBlobRejectsMalformedKey — fail-closed Self-review: 1. Data loss — exact opposite: this commit RECOVERS zsets that were silently dropped. New fail-closed guards prevent silently importing a corrupt blob. 2. Concurrency — no new shared state; per-DB sequential as before. 3. Performance — one protobuf Unmarshal per legacy zset key (same as the live read path). Member map shares the same latest-wins behavior as wide-column intake. 4. Consistency — merge order (blob first, wide-column second) is determined by snapshot lex order; tested explicitly. 5. Coverage — 6 new tests. All 84 redis tests pass.

bootjp · 2026-05-19T09:52:33Z

@claude review

claude · 2026-05-19T09:52:49Z

Claude finished @bootjp's task in 4m 37s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 3

Tasks

Gather context from PR description and prior reviews
Read internal/backup/redis_zset.go (all 3 commits)
Read internal/backup/redis_zset_test.go
Read internal/backup/redis_string.go (HandleTTL routing)
Cross-check legacy !redis|zset| handler against live encoder
Check outstanding issues from prior reviews
Post comprehensive review

The third commit (5446190) addresses the Codex P1 finding on round 2: legacy !redis|zset|<userKey> blobs were silently dropped from backup output. This review covers the full three-commit PR as shipped.

Commit 3: `HandleZSetLegacyBlob` — correctness

Magic prefix match: redisZSetLegacyProtoPrefix = []byte{0x00, 'R', 'Z', 0x01} exactly matches adapter/redis_storage_codec.go:15 storedRedisZSetProtoPrefix. A live-side rename would surface as ErrRedisInvalidZSetLegacyBlob on any real cluster dump — the "canary" pattern documented in the constant's docstring is a good operational safeguard.

decodeZSetLegacyBlobValue: rejects payloads shorter than 4 bytes, rejects mismatched magic, rejects NaN scores — all fail-closed at intake. ✓

zsetState() + claimPendingTTL(): HandleZSetLegacyBlob calls r.zsetState(userKey) which calls claimPendingTTL on first registration. This means the pendingTTL drain works for legacy blobs as well, regardless of scan order. ✓

Medium — Incorrect scan-order claim; wrong test order for legacy-blob TTL

HandleZSetLegacyBlob's docstring (lines 202–210 in the current file) states:

!redis|zset| ALSO sorts BEFORE !redis|ttl|, so an inline TTL on the same user key will reach HandleTTL after this handler has already registered redisKindZSet.

!redis|ttl|k    → HandleTTL → kindByKey["k"] == redisKindUnknown → pendingTTL["k"] = ms
!redis|zset|k   → HandleZSetLegacyBlob → zsetState() → claimPendingTTL() → st.expireAtMs = ms

The pendingTTL mechanism handles this correctly (since zsetState() always calls claimPendingTTL()), so the behavior is correct — but the comment documents the wrong ordering and the wrong code path.

TestRedisDB_ZSetLegacyBlobWithInlineTTL tests blob-first-then-TTL, which is the non-real-world order (taking the case redisKindZSet: shortcut, not the pendingTTL path). The real-world TTL-first ordering for legacy blobs is not tested.

Two fixes needed:

Correct the docstring: !redis|ttl| sorts before !redis|zset| so the TTL arrives first; the pendingTTL drain in zsetState() handles it.
Add TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob that calls HandleTTL first then HandleZSetLegacyBlob, and asserts expire_at_ms is inlined. This pins the real-world path through pendingTTL.

Fix these →

Low — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments and tests the wrong boundary

The test docstring says "key with no trailing user-key bytes — parser must still accept it (empty user key is technically valid Redis)" and then the body comment says "Use a key that doesn't have the prefix to trigger the parse failure." The test uses []byte("not-the-right-prefix|k") — a key that doesn't have the !redis|zset| prefix at all.

Inspecting parseZSetLegacyBlobKey: it returns (rest, true) whenever TrimPrefix removes anything, including when rest is empty (prefix-only key). So HandleZSetLegacyBlob([]byte("!redis|zset|"), value) succeeds with an empty user key. The test never touches this boundary.

This isn't a functional bug — the test does exercise the prefix-missing error path — but the test name (RejectsMalformedKey) and first comment are misleading. Better: rename to TestRedisDB_ZSetLegacyBlobRejectsMissingPrefix and remove the contradictory "no trailing user-key bytes" comment.

Outstanding from prior reviews (still unaddressed)

Both prior Claude reviews flagged these. They remain open:

Low — TestRedisDB_ZSetBinaryMemberUsesBase64Envelope (redis_zset_test.go:346): only checks envelope["base64"] == "" (non-empty presence). A wrong base64 payload still passes. Fully decode and compare against []byte{0x80, 0xff, 0x01}:

payload, ok := envelope["base64"].(string)
if !ok || payload == "" { t.Fatalf(...) }
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil { t.Fatalf("invalid base64: %v", err) }
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Nit — Five //nolint:gosec suppressions remain in test helpers (encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, zsetMetaDeltaKey). Per CLAUDE.md: "Avoid //nolint — refactor instead." Adding a bounds panic before the cast makes the conversion obviously safe and removes the suppressions.

Nit — []byte(RedisZSetMetaDeltaPrefix), []byte(RedisZSetMetaPrefix), []byte(RedisZSetMemberPrefix) are allocated per dispatched record in parseZSetMetaKey, parseZSetMemberKey, and HandleZSetMeta. Hoisting these to package-level var (as CodeRabbit suggested) eliminates the per-record overhead.

Confirmed correct in commit 3

Magic prefix {0x00, 'R', 'Z', 0x01} matches the live storedRedisZSetProtoPrefix exactly ✓
decodeZSetLegacyBlobValue rejects payloads shorter than 4 bytes (empty protobuf body is 0 bytes; 0 < 4 → error) ✓
NaN fail-closed in decodeZSetLegacyBlobValue matches HandleZSetMember's contract ✓
Mid-migration merge (!redis|zset| arrives before !zs|...): ordering claim and test are correct ✓
zsetState() → claimPendingTTL() chain means pendingTTL works for legacy blobs even though the comment misstates the reason ✓
encodeZSetLegacyBlobValue test helper mirrors adapter/redis_storage_codec.go::marshalZSetValue exactly ✓
zsetLegacyBlobKey mirrors adapter/redis_compat_types.go:177 ZSetKey exactly ✓
6 new legacy-blob tests cover round-trip, mid-migration merge, NaN rejection, missing-magic rejection ✓

Summary

The Codex P1 data-loss fix is architecturally sound: HandleZSetLegacyBlob decodes the protobuf blob, registers the same per-key state HandleZSetMeta + HandleZSetMember would, and the zsetState() → claimPendingTTL() chain means TTL inlining works via pendingTTL for the real snapshot order. The one material issue is that the code comments state the wrong scan order for !redis|ttl| vs !redis|zset|, and the TTL test exercises the non-real-world path (blob-first), leaving the actual pendingTTL path for legacy blobs untested. The base64 shallow assertion and //nolint:gosec suppressions carry over from prior reviews.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5446190f78

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T09:56:06Z

+	st := r.zsetState(userKey)
+	for _, e := range entries {
+		st.members[e.member] = e.score
+	}


Ignore stale legacy zset blobs when wide rows exist

When a snapshot contains both !redis|zset|<key> and !zs|mem|... rows for the same key, this merge makes backup output include members from the legacy blob that are not user-visible anymore. I checked the live read path in adapter/redis_compat_helpers.go:610-620: as soon as any wide-column member exists, loadZSetAt returns only loadZSetMembersAt and uses the legacy blob only as a fallback when no wide rows exist. A dump taken after a partially migrated/stale legacy blob remains alongside current wide rows would therefore restore deleted or outdated members; the encoder should skip or replace legacy state once wide-column rows are observed for that key.

Useful? React with 👍 / 👎.

Codex flagged that the r2 fix (legacy-blob handler in #790 #2) mis-handles the mid-migration / stale-leftover case: when a snapshot contains BOTH `!redis|zset|<k>` and `!zs|mem|<k>...` for the same user key, my code merged both, surfacing members that are no longer visible to readers. Live read path (adapter/redis_compat_helpers.go:610-620 RedisServer.loadZSetAt): when ANY wide-column row exists, loadZSetMembersAt is used and the legacy blob is IGNORED. A stale `!redis|zset|` blob left over after a partial migration would be invisible to clients but my encoder was surfacing its entries in the dump — silent backup corruption. Fix: add a sawWide flag to redisZSetState that flips on every wide-column observation (HandleZSetMeta + HandleZSetMember). markZSetWide() clears any legacy entries previously deposited by HandleZSetLegacyBlob, and HandleZSetLegacyBlob short-circuits if sawWide is already set. The result matches the live read path's source-of-truth selection: - No wide rows → legacy blob is the source of truth. - Any wide row exists → wide rows are authoritative; legacy entries are ignored (whether they arrived before or after the wide row). Caller audit (per /loop standing instruction): the new markZSetWide method is called only from HandleZSetMeta line 160 and HandleZSetMember line 194. HandleZSetLegacyBlob (line 268) short-circuits on the same flag. No other call sites — verified via 'grep -rn markZSetWide|sawWide internal/backup/'. The semantic change (legacy entries cleared on first wide observation) is contained to redisZSetState; no public API or external caller visible. Test changes: - TestRedisDB_ZSetLegacyBlobThenWideColumnMerges renamed to TestRedisDB_ZSetWideColumnSupersedesLegacyBlob, expectations updated (bob-stale no longer survives). - New TestRedisDB_ZSetMetaAloneEvictsLegacyBlob — meta-only wide-column row evicts legacy blob. - New TestRedisDB_ZSetLegacyBlobAfterWideRowsIsDropped — reverse- order case (wide first, blob later). Self-review: 1. Data loss — opposite: this fix PREVENTS surfacing stale members that the live read path hides. Legacy blob is still applied when no wide rows exist (preserves the original P1 round 2 fix for legacy-only zsets). 2. Concurrency — no new shared state. 3. Performance — sawWide is a single bool per state; markZSetWide short-circuits on subsequent calls so the clear-loop runs at most once per state. 4. Consistency — encoder output now matches `loadZSetAt`'s read- side source-of-truth selection. Tested with both orderings (legacy-first-wide-later, wide-first-legacy-later, meta-alone). 5. Coverage — 1 renamed test + 2 new tests. All 90 redis tests pass.

bootjp · 2026-05-19T10:01:42Z

@claude review

claude · 2026-05-19T10:02:01Z

Claude finished @bootjp's task in 4m 12s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 4

Tasks

Gather context from PR description and prior reviews
Read internal/backup/redis_zset.go (all 4 commits)
Read internal/backup/redis_zset_test.go
Read internal/backup/redis_string.go and redis_set.go
Cross-check scan ordering and pendingTTL logic
Check all outstanding issues from Round 3
Analyze commit 4 (ab88f3e): wide-column rows supersede legacy blob
Post comprehensive review

Commit 4 (`ab88f3e`) — Wide-column supersedes legacy blob

This addresses the Codex P1 round-3 finding correctly. The sawWide flag + markZSetWide() mechanism mirrors the live read path (adapter/redis_compat_helpers.go:610-620) precisely:

markZSetWide() logic traced across all orderings:

Order	Trace	Result
legacy blob → wide member	`HandleZSetLegacyBlob` adds {alice:1, bob-stale:2}; `HandleZSetMember("alice", 99)` → `markZSetWide` clears map, adds alice:99	{alice:99} ✓
wide member → legacy blob	`HandleZSetMember` → sawWide=true; `HandleZSetLegacyBlob` → `st.sawWide` check → early return	wide-only ✓
legacy blob → meta(0)	`HandleZSetLegacyBlob` adds {stale:1}; `HandleZSetMeta(0)` → `markZSetWide` clears map	{} ✓
legacy blob alone	sawWide never set; entries stay in map through Finalize	legacy entries ✓

Key invariants confirmed:

markZSetWide() is called from both HandleZSetMeta (line 160) and HandleZSetMember (line 194), covering every wide-column entry point ✓
HandleZSetScore and HandleZSetMetaDelta are no-ops; neither creates state nor sets sawWide — correct, these are secondary-index / delta records ✓
Three new tests (TestRedisDB_ZSetWideColumnSupersedesLegacyBlob, TestRedisDB_ZSetMetaAloneEvictsLegacyBlob, TestRedisDB_ZSetLegacyBlobAfterWideRowsIsDropped) cover all key orderings ✓
The markZSetWide in-place map-clear (for k := range st.members { delete(st.members, k) }) avoids changing the pointer — consistent with the nil-check-free pattern used throughout ✓

No new issues found in commit 4.

Issues — Remaining from prior rounds (none addressed in commit 4)

Medium — `HandleZSetLegacyBlob` docstring states the wrong scan order for TTL (lines 252–257)

The comment claims:

!redis|zset| ALSO sorts BEFORE !redis|ttl|, so an inline TTL … will reach HandleTTL after this handler has already registered redisKindZSet.

!redis|ttl|k   → HandleTTL → redisKindUnknown → pendingTTL["k"] = ms
!redis|zset|k  → HandleZSetLegacyBlob → zsetState() → claimPendingTTL() → st.expireAtMs = ms

The code is correct — the zsetState() → claimPendingTTL() chain handles this. Only the comment documents the wrong path. The first paragraph at line 246–250 also incorrectly says !redis|zset| sorts before !redis|ttl|.

Fix this →

Low — `TestRedisDB_ZSetLegacyBlobWithInlineTTL` tests the non-real-world scan order

TestRedisDB_ZSetLegacyBlobWithInlineTTL (line 783) calls HandleZSetLegacyBlob first, then HandleTTL. In a real Pebble scan the opposite is true (!redis|ttl| < !redis|zset| as shown above), so the test exercises the HandleTTL → redisKindZSet → zsetState(k) shortcut path — not the actual pendingTTL → claimPendingTTL() path that fires in production.

A companion test is needed:

func TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob(t *testing.T) {
    db, root := newRedisDB(t)
    // Real-world order: TTL arrives first (!redis|ttl| < !redis|zset|).
    if err := db.HandleTTL([]byte("k"), encodeTTLValue(fixedExpireMs)); err != nil {
        t.Fatal(err)
    }
    value := encodeZSetLegacyBlobValue(t, []zsetLegacyEntry{{member: "m", score: 1}})
    if err := db.HandleZSetLegacyBlob(zsetLegacyBlobKey("k"), value); err != nil {
        t.Fatal(err)
    }
    if err := db.Finalize(); err != nil { t.Fatal(err) }
    got := readZSetJSON(t, filepath.Join(root, "redis", "db_0", "zsets", "k.json"))
    if zsetFloat(t, got, "expire_at_ms") != float64(fixedExpireMs) {
        t.Fatalf("expire_at_ms = %v want %d — pendingTTL path for legacy blob failed",
            got["expire_at_ms"], fixedExpireMs)
    }
}

Fix this →

Low — Base64 assertion is non-functional (`redis_zset_test.go:346-348`, carried from rounds 2–3)

if envelope["base64"] == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}

envelope["base64"] is a any typed map value; comparing it to "" compares the interface value to a string. This check passes even if the payload is corrupt. Fully decode and compare:

payload, ok := envelope["base64"].(string)
if !ok || payload == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil {
    t.Fatalf("invalid base64 payload %q: %v", payload, err)
}
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Fix this →

Nit — `//nolint:gosec` suppressions in test helpers (lines 50, 67, 77, 90, 102; carried from rounds 2–3)

Per CLAUDE.md: Avoid //nolint — refactor instead. The five integer-to-uint conversions in encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, and zsetMetaDeltaKey can be made obviously safe by adding an explicit bounds panic before the cast. CodeRabbit's proposed putUserKeyLen helper applies here.

Nit — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments (line 838–841; carried from round 3)

The test comment block says "Key has the prefix but no trailing user-key bytes — parser must still accept it (empty user key is technically valid Redis)" but the actual test passes a key that lacks the !redis|zset| prefix entirely ("not-the-right-prefix|k"). The test exercises the missing-prefix path, not the empty-user-key path. Rename to TestRedisDB_ZSetLegacyBlobRejectsMissingPrefix and remove the contradictory comment.

Confirmed correct across all 4 commits

sawWide mechanism mirrors adapter/redis_compat_helpers.go:610-620 exactly: wide-column rows are authoritative, legacy blob is a fallback consulted only when no wide rows exist ✓
markZSetWide is guarded by if st.sawWide { return } making it idempotent across multiple wide member records ✓
redisZSetLegacyProtoPrefix = []byte{0x00, 'R', 'Z', 0x01} matches adapter/redis_storage_codec.go:15 storedRedisZSetProtoPrefix ✓
!redis|zset| < !zs|... ordering is correct for the mid-migration merge case (at the first diverging byte: r=0x72 < z=0x7A) ✓
NaN fail-closed in decodeZSetLegacyBlobValue matches HandleZSetMember contract ✓
pendingTTL drain in zsetState() correctly handles the legacy-blob+TTL scenario at the code level (comment is wrong, code is right) ✓
encodeZSetLegacyBlobValue and zsetLegacyBlobKey test helpers match the live encoder/key builder exactly ✓
All six legacy-blob tests in commit 3 + three new wide-supersedes-legacy tests in commit 4 are well-constructed ✓

Summary

Commit 4 is architecturally correct: sawWide + markZSetWide() faithfully replicates the live read path's source-of-truth preference, and the three new tests pin all ordering variants. The one functional gap is that the real-world scan order for TTL'd legacy blobs (TTL arrives first, blob second) is not exercised by a test — the existing TestRedisDB_ZSetLegacyBlobWithInlineTTL tests the synthetic reverse order. The docstring at lines 252–257 documents the wrong path for the same reason. Both can be fixed together with a comment correction and a single new test.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab88f3e096

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T10:03:47Z

+	// this state is stale; drop it. The live read path
+	// (adapter/redis_compat_helpers.go:610-620) makes the same
+	// choice on read.
+	r.markZSetWide(st)


Preserve legacy zsets when only meta exists

When a snapshot has a legacy !redis|zset|<k> blob plus a !zs|meta|<k> row but no !zs|mem| rows, this call clears the legacy members and the dump restores an empty zset. The live read path in adapter/redis_compat_helpers.go:610-623 only switches away from the legacy blob when the member-prefix scan returns rows; meta alone is not used as the value source. This can happen with stale/partial wide metadata alongside a still-visible legacy blob, so the backup should only evict legacy entries once a member row is observed, not on meta alone.

Useful? React with 👍 / 👎.

Codex flagged the same P1 on PR #791 round 2 — pendingTTL grows unboundedly when no zset handler is present to drain the parked TTLs. The same bug exists on PR #790's branch (the two PRs share the pendingTTL infrastructure; the codex finding's text applies verbatim). Fix: cap pendingTTL at pendingTTLCap entries (default 1 MiB). Once the cap is reached, subsequent unknown-kind TTLs fall back to immediate-orphan counting via the new parkUnknownTTL helper. New API: - RedisDB.WithPendingTTLCap(int) chainable setter. - pendingTTLOverflow counter surfaced in the orphan-TTL warning. Caller audit (per /loop standing instruction): - HandleTTL's redisKindUnknown branch behavior changes: previously ALWAYS buffered, now buffers up to cap then orphan-counts. Other branches unchanged. - Callers of HandleTTL: tests only. The cmd/elastickv-snapshot-decode driver (Phase 0a follow-up) will be the first production caller. - New helper parkUnknownTTL is package-private with one call site (HandleTTL). No prior call sites to audit. - Verified via `grep -rn HandleTTL|pendingTTLCap|WithPendingTTLCap --include=*.go internal/backup/`. New tests: - TestRedisDB_PendingTTLBoundedByCap. - TestRedisDB_WithPendingTTLCapZeroDisablesBuffering. - TestRedisDB_WithPendingTTLCapNegativeCoercedToZero. Self-review: 1. Data loss - cap can mis-classify TTLs in the overflow window that COULD have been drained by a later state-init. Mitigation: default cap (1 MiB) is well above the count of legitimately- buffered wide-column TTL'd keys on real clusters; operators can tune via WithPendingTTLCap. 2. Concurrency - no new shared state. 3. Performance - one extra comparison per unknown-kind TTL. 4. Consistency - mirrors the fix landing on PR #791 r3; when either PR merges first the other can rebase cleanly. 5. Coverage - 3 new tests + the existing TestRedisDB_OrphanTTLCountedNotBuffered (still pins the default-cap buffered + drained-at-Finalize semantics).

bootjp · 2026-05-19T19:46:49Z

@claude review

claude · 2026-05-19T19:47:03Z

Claude finished @bootjp's task in 3m 5s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 5

Tasks

Gather context from PR description and prior reviews (rounds 1–4)
Read internal/backup/redis_zset.go (all 5 commits)
Read internal/backup/redis_zset_test.go
Read internal/backup/redis_string.go
Review commit 5 (174456a): bound pendingTTL to prevent OOM
Check all outstanding issues from Round 4
Post comprehensive review

Commit 5 (`174456a`) — `pendingTTL` OOM bound

The fix is architecturally correct and complete:

defaultPendingTTLCap = 1 << 20 (~64 MiB at ~64 B/entry) with clear documentation in the constant's comment ✓
WithPendingTTLCap(0) disables buffering entirely (reverts to pre-pendingTTL immediate-orphan behavior); negative coerced to 0 ✓
parkUnknownTTL caps at pendingTTLCap entries; overflow increments both pendingTTLOverflow and orphanTTLCount immediately ✓
Finalize surfaces pending_ttl_buffer_overflow and pending_ttl_buffer_cap in the warning so operators can distinguish "exceeded cap" from "truly unmatched TTLs" ✓
Three tests (TestRedisDB_PendingTTLBoundedByCap, TestRedisDB_WithPendingTTLCapZeroDisablesBuffering, TestRedisDB_WithPendingTTLCapNegativeCoercedToZero) correctly pin all behavioral variants ✓

TestRedisDB_PendingTTLBoundedByCap arithmetic verified:
cap=8, 16 writes → intake: len(pendingTTL)=8, orphanTTLCount=8, pendingTTLOverflow=8 → after Finalize: orphanTTLCount=16. Correct.

Low — `parkUnknownTTL`: duplicate-key at-cap edge case

When the buffer is exactly at pendingTTLCap and a TTL record arrives for a key already in pendingTTL, the code takes the overflow branch (incrementing both pendingTTLOverflow and orphanTTLCount) rather than updating the existing entry. If the key is later claimed by a typed record, orphanTTLCount will have been incremented by 1 for a key that was actually buffered — a phantom orphan.

In practice Pebble snapshots don't emit duplicate !redis|ttl| records for the same key, so this is effectively unreachable today. But a one-line pre-check makes the edge case explicit and prevents overcounting in a corrupt snapshot:

func (r *RedisDB) parkUnknownTTL(userKey []byte, expireAtMs uint64) {
    uk := string(userKey)
    if _, alreadyBuffered := r.pendingTTL[uk]; alreadyBuffered {
        r.pendingTTL[uk] = expireAtMs // latest-wins update; no size change
        return
    }
    if len(r.pendingTTL) >= r.pendingTTLCap {
        r.pendingTTLOverflow++
        r.orphanTTLCount++
        return
    }
    r.pendingTTL[uk] = expireAtMs
}

Fix this →

Issues — Carried from prior rounds (still unaddressed in commit 5)

Medium — `HandleZSetLegacyBlob` docstring states wrong scan order for TTL

redis_zset.go:246–257 says:

!redis|zset| ALSO sorts BEFORE !redis|ttl| …

!redis|ttl|k   → HandleTTL → redisKindUnknown → pendingTTL["k"] = ms
!redis|zset|k  → HandleZSetLegacyBlob → zsetState() → claimPendingTTL() → st.expireAtMs = ms

Fix this →

Low — `TestRedisDB_ZSetLegacyBlobWithInlineTTL` tests non-real-world scan order

The test docstring (line 779) repeats the wrong claim (!redis|zset| sorts before !redis|ttl|) and the test exercises the blob-first → TTL-second path (i.e., the HandleTTL redisKindZSet fast-path). The real-world ordering is TTL-first → blob-second (the pendingTTL path) — which is currently untested for legacy blobs.

A companion test is still needed:

func TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob(t *testing.T) {
    t.Parallel()
    db, root := newRedisDB(t)
    // Real-world order: !redis|ttl|k sorts before !redis|zset|k (t < z).
    if err := db.HandleTTL([]byte("k"), encodeTTLValue(fixedExpireMs)); err != nil {
        t.Fatal(err)
    }
    value := encodeZSetLegacyBlobValue(t, []zsetLegacyEntry{{member: "m", score: 1}})
    if err := db.HandleZSetLegacyBlob(zsetLegacyBlobKey("k"), value); err != nil {
        t.Fatal(err)
    }
    if err := db.Finalize(); err != nil { t.Fatal(err) }
    got := readZSetJSON(t, filepath.Join(root, "redis", "db_0", "zsets", "k.json"))
    if zsetFloat(t, got, "expire_at_ms") != float64(fixedExpireMs) {
        t.Fatalf("expire_at_ms = %v want %d — pendingTTL path for legacy blob failed",
            got["expire_at_ms"], fixedExpireMs)
    }
}

Fix this →

Low — Base64 assertion is non-functional (`redis_zset_test.go:346–348`)

envelope["base64"] == "" compares an any-typed map value to the string "". A wrong-but-non-empty base64 payload still passes. Need a full round-trip decode (unchanged from Rounds 2–4):

payload, ok := envelope["base64"].(string)
if !ok || payload == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil {
    t.Fatalf("invalid base64 payload %q: %v", payload, err)
}
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Fix this →

Nit — `//nolint:gosec` in test helpers (lines 50, 67, 77, 90, 102)

Per CLAUDE.md: "Avoid //nolint — refactor instead." The five integer conversions in encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, and zsetMetaDeltaKey can be made obviously safe with a bounds panic (or putUserKeyLen helper as CodeRabbit suggested), eliminating the suppressions.

Nit — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments

The outer docstring (line 831) says "key with no trailing user-key bytes fails parse" but the inner comment (lines 837–840) says "empty user key is technically valid Redis … use a key that doesn't have the prefix." The test actually exercises the missing-prefix case, not an empty-user-key case. Rename to TestRedisDB_ZSetLegacyBlobRejectsMissingPrefix and remove the contradictory comment.

Confirmed correct across all 5 commits

pendingTTLCap default (1M entries, ~64 MiB) is documented and correctly applied ✓
WithPendingTTLCap(0) disables buffering; negative coercion to 0 is clean ✓
Finalize double-counts nothing: overflow path increments orphanTTLCount at intake; orphanTTLCount += len(pendingTTL) at Finalize adds only the buffered-but-unclaimed entries ✓
sawWide + markZSetWide() faithfully mirrors adapter/redis_compat_helpers.go:610-620 ✓
redisZSetLegacyProtoPrefix matches adapter/redis_storage_codec.go:15 ✓
NaN fail-closed in both HandleZSetMember and decodeZSetLegacyBlobValue ✓
!redis|zset| < !zs|... ordering correct (mid-migration merge case) ✓
zsetState() → claimPendingTTL() chain correctly handles TTL-before-zset ordering at the code level ✓

Summary

Commit 5's pendingTTL OOM bound is correct: the 1 << 20 cap, the WithPendingTTLCap override, and the three new tests are all well-constructed. The one behavioral gap is the duplicate-key-at-cap edge case in parkUnknownTTL (phantom orphan count on corrupt input, not reachable from a healthy Pebble snapshot).

From prior rounds, the docstring scan-order error (!redis|zset| sorts after, not before, !redis|ttl|) and the missing TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob test remain open — the code handles the real-world TTL-first ordering correctly via pendingTTL, but the comment documents the wrong path and the test exercises the non-production order.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 174456aed1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T19:50:20Z

+	if len(r.pendingTTL) >= r.pendingTTLCap {
+		// Cap reached: fall back to the immediate-orphan path so
+		// memory stays bounded. Already-buffered entries can still
+		// be drained by later state-inits.
+		r.pendingTTLOverflow++
+		r.orphanTTLCount++
+		return


Fail closed when pending TTL buffer reaches cap

When pendingTTL hits pendingTTLCap, this branch drops subsequent unknown-kind TTLs by counting them as orphans instead of buffering them. In real snapshot order (!redis|ttl| before !st|/!zs|/!stream|), those dropped entries can belong to valid set/zset/stream keys that are parsed later, so their expire_at_ms is permanently lost and restored data becomes non-expiring. This can happen on legitimate large datasets (e.g., more than 1,048,576 expiring wide-column keys with default cap), and the code only warns instead of preventing partial-corruption output.

Useful? React with 👍 / 👎.

Codex P1 on round 4: the r4 fix capped pendingTTL but silently counted overflow entries as orphans. That's a data-loss path on legitimate large datasets — in real Pebble snapshot order (`!redis|ttl|` < `!st|`/`!stream|`/`!zs|`), an overflowed TTL likely belongs to a wide-column key arriving later in the scan, and dropping it permanently loses `expire_at_ms`. Restored data ends up non-expiring without the operator noticing — only a warning gets logged. Fix: replace the silent-orphan fallback with fail-closed semantics. Three modes determined by pendingTTLCap: - cap > 0 and buffer NOT full: buffer the entry as before. - cap == 0: counter-only mode (operator explicit opt-out; orphan-count without buffering). - cap > 0 and buffer FULL: fail closed with new sentinel ErrPendingTTLBufferFull. Operator must raise the cap via WithPendingTTLCap or investigate the snapshot for corruption. Caller audit (per /loop standing instruction): - HandleTTL's redisKindUnknown branch now CAN return non-nil error in a new condition. Existing callers (all in test files) already check the error. Verified via `grep -rn '\.HandleTTL(' --include=*.go`: every call site has `if err := db.HandleTTL(...); err != nil { ... }`. No silent swallow. - The cmd/elastickv-snapshot-decode driver (Phase 0a follow-up) will see the new error class and can either bail or call WithPendingTTLCap before retrying. - Tests TestRedisDB_PendingTTLFailsClosedAtCap (renamed from TestRedisDB_PendingTTLBoundedByCap) and TestRedisDB_WithPendingTTLCapZeroOpts (renamed from TestRedisDB_WithPendingTTLCapZeroDisablesBuffering) pin the new semantics. The existing TestRedisDB_OrphanTTLCountedNotBuffered drives 10,000 TTLs with default cap=1M, well under the cap so no behavior change. - parkUnknownTTL signature changed from `func(...)` to `func(...) error`. Single caller (HandleTTL) updated. Self-review: 1. Data loss - opposite of the original concern: fail-closed prevents silent TTL drops. Operators who explicitly accept the lossy path opt in via WithPendingTTLCap(0). 2. Concurrency - no new shared state; orphanTTLCount unchanged for the cap=0 counter mode. 3. Performance - one comparison per unknown-kind TTL. Failed- closed path is a clean Go error return; no allocation amplification. 4. Consistency - matches the "fail closed" pattern called out in CLAUDE.md (replication/HLC/snapshot reads). Error type makes the failure mode legible to the caller. 5. Coverage - 2 renamed tests pin the new semantics. The error path is exercised at the cap+1 record; the counter-only mode stays lossy by explicit opt-in.

bootjp · 2026-05-19T19:55:37Z

@claude review

claude · 2026-05-19T19:55:54Z

Claude finished @bootjp's task in 4m 11s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 6

Tasks

Gather context from PR description and prior reviews (rounds 1–5)
Read commit 6 (800cc8c): fail closed on pendingTTL overflow
Read internal/backup/redis_string.go (parkUnknownTTL changes)
Read internal/backup/redis_zset.go (docstring / scan-order)
Read internal/backup/redis_zset_test.go (new tests, base64 assertion)
Check all outstanding issues from Round 5
Post comprehensive review

Commit 6 (`800cc8c`) — fail-closed on pendingTTL overflow

The Codex P1 finding from Round 5 is correctly addressed. The three-mode design is sound:

pendingTTLCap	buffer state	result
0	—	`orphanTTLCount++`, no error (counter-only opt-in)
> 0	not full	buffer entry, `return nil`
> 0	full	`pendingTTLOverflow++`, `return ErrPendingTTLBufferFull`

Key correctness properties verified:

ErrPendingTTLBufferFull is well-documented (lines 75–91) with recovery guidance pointing to WithPendingTTLCap.
pendingTTLOverflow is still tracked so Finalize's redis_orphan_ttl warning can surface both the count and the cap context (pending_ttl_buffer_overflow / pending_ttl_buffer_cap fields), letting operators distinguish corruption from an undersized cap.
HandleTTL's redisKindUnknown case propagates parkUnknownTTL's error to the caller. All test call sites already check HandleTTL errors.
TestRedisDB_PendingTTLFailsClosedAtCap: correctly asserts orphanTTLCount == 0 at intake (caller must handle the error, not Finalize).
TestRedisDB_WithPendingTTLCapZeroOpts: counter-only mode, no error, immediate increment.
TestRedisDB_WithPendingTTLCapNegativeCoercedToZero: input-sanitisation guard.

Issues

Low — `parkUnknownTTL`: false error on duplicate-key-at-cap (carried from Round 5, new shape)

The Round-5 concern changes shape under the fail-closed semantics. When the buffer is exactly at cap and a TTL record arrives for a key already in pendingTTL:

if len(r.pendingTTL) >= r.pendingTTLCap {  // true — key exists, len == cap
    r.pendingTTLOverflow++
    return ErrPendingTTLBufferFull          // ← wrong: key IS already buffered
}

The key is in the buffer (with the previous expiry) and will be claimed by claimPendingTTL when the typed record arrives. But the caller sees ErrPendingTTLBufferFull — a false error that may cause the snapshot decoder to abort the decode for a key that was actually handled correctly.

A healthy Pebble snapshot never emits duplicate !redis|ttl| records for the same key, so this only triggers on corrupt snapshots. Still, returning an error for something that succeeded is a correctness violation, and the fix is a one-liner:

func (r *RedisDB) parkUnknownTTL(userKey []byte, expireAtMs uint64) error {
    if r.pendingTTLCap == 0 {
        r.orphanTTLCount++
        return nil
    }
    uk := string(userKey)
    if _, alreadyBuffered := r.pendingTTL[uk]; alreadyBuffered {
        r.pendingTTL[uk] = expireAtMs // latest-wins; no size change
        return nil
    }
    if len(r.pendingTTL) >= r.pendingTTLCap {
        r.pendingTTLOverflow++
        return cockroachdberr.Wrapf(ErrPendingTTLBufferFull, ...)
    }
    r.pendingTTL[uk] = expireAtMs
    return nil
}

Fix this →

Medium — `HandleZSetLegacyBlob` docstring states wrong scan order for `!redis|ttl|` (carried from Round 3)

!redis|ttl|k   → HandleTTL → redisKindUnknown → pendingTTL["k"] = ms
!redis|zset|k  → HandleZSetLegacyBlob → zsetState() → claimPendingTTL() → st.expireAtMs = ms

The code is correct — zsetState() → claimPendingTTL() handles this. But the comment documents the wrong path and creates false confidence that the redisKindZSet HandleTTL shortcut is what fires in production (it isn't — that only fires if a wide-column row arrived before the TTL, which would also be non-standard Pebble order).

The TestRedisDB_ZSetLegacyBlobWithInlineTTL docstring (line 779–782) makes the same wrong claim, saying !redis|zset|<k> "sorts before !redis|ttl|."

Fix this →

Low — Missing test for real-world TTL-first ordering on legacy blobs (carried from Round 3)

TestRedisDB_ZSetLegacyBlobWithInlineTTL exercises blob-first-then-TTL (the redisKindZSet HandleTTL fast path). The actual Pebble ordering — !redis|ttl|k arrives before !redis|zset|k — goes through pendingTTL → claimPendingTTL(), which is untested for legacy blobs. A companion test would pin the real path:

func TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob(t *testing.T) {
    t.Parallel()
    db, root := newRedisDB(t)
    // Real-world Pebble order: !redis|ttl|k < !redis|zset|k (t < z).
    if err := db.HandleTTL([]byte("k"), encodeTTLValue(fixedExpireMs)); err != nil {
        t.Fatal(err)
    }
    value := encodeZSetLegacyBlobValue(t, []zsetLegacyEntry{{member: "m", score: 1}})
    if err := db.HandleZSetLegacyBlob(zsetLegacyBlobKey("k"), value); err != nil {
        t.Fatal(err)
    }
    if err := db.Finalize(); err != nil { t.Fatal(err) }
    got := readZSetJSON(t, filepath.Join(root, "redis", "db_0", "zsets", "k.json"))
    if zsetFloat(t, got, "expire_at_ms") != float64(fixedExpireMs) {
        t.Fatalf("expire_at_ms = %v want %d — pendingTTL path for legacy blob failed",
            got["expire_at_ms"], fixedExpireMs)
    }
}

Fix this →

Low — Base64 assertion is non-functional (`redis_zset_test.go:346–348`, carried from Round 2)

if envelope["base64"] == "" {

envelope["base64"] is typed as any. Comparing an any to a string literal "" with == compares the interface value to the string — this evaluates to false whenever the map value is non-nil (even if the string payload is "garbage"), so a wrong base64 payload still passes. The assertion should be:

payload, ok := envelope["base64"].(string)
if !ok || payload == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil {
    t.Fatalf("invalid base64 payload %q: %v", payload, err)
}
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Add "bytes" and "encoding/base64" to the import block.

Fix this →

Nits (carried from prior rounds, still unaddressed)

//nolint:gosec suppressions in test helpers (lines 50, 67, 77, 90, 102) — five integer-to-uint casts in encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, zsetMetaDeltaKey. Per CLAUDE.md: "Avoid //nolint — refactor instead." A putUserKeyLen helper (as CodeRabbit proposed) or a bounds-panic before each cast removes all five suppressions cleanly.

TestRedisDB_ZSetLegacyBlobRejectsMalformedKey contradictory comment (lines 837–840) — the comment block says "key with no trailing user-key bytes — parser must still accept it (empty user key is technically valid Redis)" then immediately says "Use a key that doesn't have the prefix to trigger the parse failure." These contradict each other. Rename to TestRedisDB_ZSetLegacyBlobRejectsMissingPrefix and remove the contradictory first sentence.

Per-record []byte() allocations — []byte(RedisZSetMetaDeltaPrefix) / []byte(RedisZSetMetaPrefix) / []byte(RedisZSetMemberPrefix) at lines 132, 361, 364, 376 create a new allocation on every dispatched record. Hoisting to package-level var eliminates ~4 allocations per record in the snapshot hot path.

Confirmed correct in commit 6

Three-mode parkUnknownTTL is correct: cap==0 (counter-only), not-full (buffer), full (fail-closed) ✓
pendingTTLOverflow incremented before the error return so Finalize warning carries the count even if the caller swallows ErrPendingTTLBufferFull ✓
TestRedisDB_PendingTTLFailsClosedAtCap correctly asserts orphanTTLCount == 0 at intake ✓
TestRedisDB_WithPendingTTLCapZeroOpts correctly asserts no error, immediate orphan count ✓
Rename from *_BoundedByCap / *_DisablesBuffering → *_FailsClosedAtCap / *_CapZeroOpts is accurate ✓
All HandleTTL call sites already check the returned error ✓

Summary

Commit 6's fail-closed semantics are architecturally correct. The remaining open items — all carried since earlier rounds — are: a false error in parkUnknownTTL when a duplicate TTL arrives at cap (the entry IS buffered; returning an error is wrong), the wrong scan-order claim in the HandleZSetLegacyBlob docstring and test comment, the missing real-world TTL-first ordering test for legacy blobs, and the non-functional base64 assertion. The //nolint:gosec suppressions and contradictory test comment are style nits.

Mirror of PR #790 round 5 codex P1 fix: the cap-with-silent-orphan fallback introduced in r3 is itself a data-loss path. In real Pebble snapshot order (`!redis|ttl|` < `!st|`/`!stream|`/`!zs|`), an overflowed TTL likely belongs to a wide-column key arriving later in the scan; dropping it permanently loses `expire_at_ms`. Fix: replace the silent-orphan fallback with fail-closed semantics. Three modes: - cap > 0, buffer NOT full: buffer the entry as before. - cap == 0: counter-only mode (operator explicit opt-out). - cap > 0, buffer FULL: return ErrPendingTTLBufferFull. Operator must raise WithPendingTTLCap or accept counter-only via cap=0. Caller audit (per /loop standing instruction): same as the parallel PR #790 r5 fix. HandleTTL's redisKindUnknown branch CAN return a non-nil error in a new condition. All call sites check the error (grep -rn '\.HandleTTL(' --include=*.go). parkUnknownTTL changes signature from `func(...)` to `func(...) error`; single caller (HandleTTL) updated. Test renames: - TestRedisDB_PendingTTLBoundedByCap → TestRedisDB_PendingTTLFailsClosedAtCap - TestRedisDB_WithPendingTTLCapZeroDisablesBuffering → TestRedisDB_WithPendingTTLCapZeroOpts Self-review: same shape as PR #790 r5 commit's review. Data loss avoidance via fail-closed; semantics match CLAUDE.md's pattern for replication/HLC/snapshot invariants.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 800cc8c16d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T20:00:40Z

+	if len(r.pendingTTL) >= r.pendingTTLCap {
+		// Fail closed: refuse to silently drop a TTL that may
+		// belong to a wide-column key arriving later in the
+		// snapshot scan. The operator should raise the cap
+		// (WithPendingTTLCap) or investigate the snapshot for
+		// corruption. We still increment pendingTTLOverflow so
+		// the Finalize warning surfaces the count even if the
+		// caller swallows the error.
+		r.pendingTTLOverflow++
+		return cockroachdberr.Wrapf(ErrPendingTTLBufferFull,
+			"buffer at cap=%d (user_key_len=%d)", r.pendingTTLCap, len(userKey))
+	}
+	r.pendingTTL[string(userKey)] = expireAtMs


Bound pending TTL buffer by bytes, not entry count

parkUnknownTTL copies each unknown-key TTL into pendingTTL as string(userKey), so memory usage scales with key length, but the guard only checks len(r.pendingTTL) against pendingTTLCap. With snapshot keys allowed up to 1 MiB, a relatively small number of large unknown TTL keys can consume hundreds of MiB or more before hitting the entry cap, so the intended OOM protection is ineffective in this path. This is especially risky on corrupted/adversarial snapshots where many unmatched TTL records are possible.

Useful? React with 👍 / 👎.

Codex P1 on round 5: the entry-count cap introduced in r4 doesn't actually bound memory. Redis user keys can be up to 1 MiB each, so an entry-count cap of 1 MiB entries permits up to 1 TiB of buffered key bytes — defeating the OOM protection on adversarial snapshots with large keys. Fix: replace entry-count cap with a byte budget. Each entry costs `len(userKey) + 8` bytes (the 8 is the uint64 expireAtMs payload; Go-map bucket overhead is intentionally NOT counted so the budget stays deterministic for operators). The budget is decremented when claimPendingTTL drains an entry, so a long-running scan that drains as it goes can reuse the budget. Renames: - WithPendingTTLCap → WithPendingTTLByteCap (the cap is now bytes) - defaultPendingTTLCap → defaultPendingTTLBytesCap (64 MiB) - pendingTTLCap field → pendingTTLBytesCap (struct field) - Added pendingTTLBytes counter (current cumulative cost) - Added pendingTTLEntryOverheadBytes constant (= 8) Caller audit (per /loop standing instruction): - WithPendingTTLCap was introduced in r3/r4 of this PR; no external caller has consumed it yet. Verified via `grep -rn 'WithPendingTTLCap|pendingTTLCap\b' --include=*.go`: zero matches in the full tree (the rename is total). - parkUnknownTTL's signature unchanged (still returns error). The failure condition changes from "entry count >= cap" to "would exceed byte budget"; same fail-closed semantics, just measured in bytes. - claimPendingTTL gains a `r.pendingTTLBytes -= ...` line. Single caller (the wide-column state-init helpers); accounting matches parkUnknownTTL's add side exactly. - Finalize warning surfaces `pending_ttl_buffer_bytes_cap` instead of the old `pending_ttl_buffer_cap`. Field name change in the observable warning; no programmatic consumer exists yet. Tests renamed/added: - TestRedisDB_PendingTTLFailsClosedAtByteCap (renamed from TestRedisDB_PendingTTLFailsClosedAtCap; switched to byte-cap semantics with 8-byte keys × 16-byte cost). - New TestRedisDB_PendingTTLByteCapBoundedByLargeKey pins the exact bug class codex flagged: a single 100-byte key blocks under a 64-byte budget. - New TestRedisDB_PendingTTLByteBudgetReclaimedOnClaim pins that draining via claimPendingTTL frees byte budget. - TestRedisDB_WithPendingTTLByteCapZeroOpts (renamed). - TestRedisDB_WithPendingTTLByteCapNegativeCoercedToZero (renamed). Self-review: 1. Data loss - byte-budget cap is a strictly tighter bound than the previous entry-count cap; no new data-loss surface. fail-closed semantics from r5 unchanged. 2. Concurrency - pendingTTLBytes is per-RedisDB; same single- threaded contract as the rest of the struct. 3. Performance - one int comparison + add per parkUnknownTTL; one int subtract per claimPendingTTL. O(1) overhead. 4. Consistency - byte budget matches the OOM protection reviewer asked for, sized at 64 MiB to comfortably fit any realistic legitimate workload. 5. Coverage - 5 tests total (3 new, 2 renamed). All pass with -race. Existing TestRedisDB_OrphanTTLCountedNotBuffered still passes with the default 64 MiB budget (10K small keys ≈ 180 KB).

bootjp · 2026-05-19T20:08:28Z

@claude review

claude · 2026-05-19T20:08:43Z

Claude finished @bootjp's task in 3m 44s —— View job

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 7

Tasks

Gather context from PR description and prior reviews (rounds 1–6)
Read internal/backup/redis_string.go (commit 7: byte-bound pendingTTL)
Read internal/backup/redis_zset.go (docstring, outstanding items)
Read internal/backup/redis_zset_test.go (base64 assertion, legacy TTL test)
Read internal/backup/redis_string_test.go (new byte-cap tests)
Check all outstanding issues from Round 6
Post review

Commit 7 (`8eca5cd`) — byte-budget pendingTTL

Architecturally correct and well-tested. Key properties verified:

pendingTTLBytes tracks cumulative byte cost; pendingTTLBytesCap (default 64 MiB = 64 << 20) is the budget
parkUnknownTTL:478 charges cost := len(userKey) + pendingTTLEntryOverheadBytes (8 bytes for the uint64 payload) — bounds the map regardless of key size, fixing the entry-count-cap OOM bypass on 1 MiB keys
claimPendingTTL:522 correctly decrements pendingTTLBytes -= len(uk) + pendingTTLEntryOverheadBytes, so the budget is reclaimed when an entry is drained
Finalize:562 reports pending_ttl_buffer_bytes_cap (renamed from pending_ttl_buffer_cap) in the orphan warning ✓
TestRedisDB_PendingTTLFailsClosedAtByteCap arithmetic verified: 8 "orphan-N" keys × 16 bytes each = 128 = byteCap; 9th entry fails closed ✓
TestRedisDB_PendingTTLByteCapBoundedByLargeKey: 100-byte key + 8 overhead = 108 > 64-byte cap ✓
TestRedisDB_PendingTTLByteBudgetReclaimedOnClaim: drain k0 frees 10 bytes → subsequent entry fits ✓

Issues

Low — `parkUnknownTTL`: false error on duplicate-key-at-cap (carried from R5/R6)

When the byte budget is exactly at capacity and a TTL record arrives for a key already in pendingTTL, the code reaches the overflow branch:

// redis_string.go:479
if r.pendingTTLBytes+cost > r.pendingTTLBytesCap {
    r.pendingTTLOverflow++
    return cockroachdberr.Wrapf(ErrPendingTTLBufferFull, ...)
}

Since the key is already in the map (pendingTTLBytes == cap and cost == prior entry's cost), this returns a false error — the key IS buffered and will be claimed correctly by claimPendingTTL. A healthy Pebble snapshot never emits duplicate !redis|ttl| records for the same key, so this is unreachable in practice. But returning an error for a successfully-handled key is a correctness violation that a one-line pre-check eliminates:

func (r *RedisDB) parkUnknownTTL(userKey []byte, expireAtMs uint64) error {
    if r.pendingTTLBytesCap == 0 {
        r.orphanTTLCount++
        return nil
    }
    uk := string(userKey)
    if _, alreadyBuffered := r.pendingTTL[uk]; alreadyBuffered {
        r.pendingTTL[uk] = expireAtMs // latest-wins; no byte change
        return nil
    }
    cost := len(userKey) + pendingTTLEntryOverheadBytes
    if r.pendingTTLBytes+cost > r.pendingTTLBytesCap {
        r.pendingTTLOverflow++
        return cockroachdberr.Wrapf(ErrPendingTTLBufferFull, ...)
    }
    r.pendingTTL[uk] = expireAtMs
    r.pendingTTLBytes += cost
    return nil
}

Fix this →

Medium — `HandleZSetLegacyBlob` docstring states wrong scan order for TTL (carried from R3/R4/R5/R6)

redis_zset.go:246–257 states twice that !redis|zset| sorts before !redis|ttl|:

// The legacy prefix `!redis|zset|` lex-sorts BEFORE `!zs|...` and
// BEFORE `!redis|ttl|`, …
//
// `!redis|zset|` ALSO sorts BEFORE `!redis|ttl|`, so an inline TTL …

!redis|ttl|k   → HandleTTL → redisKindUnknown → pendingTTL["k"] = ms
!redis|zset|k  → HandleZSetLegacyBlob → zsetState() → claimPendingTTL() → st.expireAtMs = ms

The code is correct — zsetState() → claimPendingTTL() handles the real ordering. But the docstring documents the wrong path and creates false confidence that the redisKindZSet HandleTTL shortcut fires in production (it only fires when a wide-column row arrived before the TTL, which itself is non-standard Pebble order for zsets).

Fix this →

Low — Missing test for real-world TTL-first ordering on legacy blobs (carried from R3/R4/R5/R6)

TestRedisDB_ZSetLegacyBlobWithInlineTTL exercises blob-first → TTL-second (the redisKindZSet HandleTTL shortcut). In a real Pebble scan !redis|ttl|k arrives before !redis|zset|k (since t < z), which goes through the pendingTTL → claimPendingTTL() path. That path is exercised by TestRedisDB_ZSetTTLArrivesBeforeRows for wide-column records but not for legacy blobs. A companion test:

func TestRedisDB_ZSetLegacyBlobTTLArrivesBeforeBlob(t *testing.T) {
    t.Parallel()
    db, root := newRedisDB(t)
    // Real Pebble order: !redis|ttl|k < !redis|zset|k (t < z).
    if err := db.HandleTTL([]byte("k"), encodeTTLValue(fixedExpireMs)); err != nil {
        t.Fatal(err)
    }
    value := encodeZSetLegacyBlobValue(t, []zsetLegacyEntry{{member: "m", score: 1}})
    if err := db.HandleZSetLegacyBlob(zsetLegacyBlobKey("k"), value); err != nil {
        t.Fatal(err)
    }
    if err := db.Finalize(); err != nil { t.Fatal(err) }
    got := readZSetJSON(t, filepath.Join(root, "redis", "db_0", "zsets", "k.json"))
    if zsetFloat(t, got, "expire_at_ms") != float64(fixedExpireMs) {
        t.Fatalf("expire_at_ms = %v want %d — pendingTTL path for legacy blob failed",
            got["expire_at_ms"], fixedExpireMs)
    }
}

Fix this →

Low — Base64 assertion is non-functional (`redis_zset_test.go:346`, carried from R2/R3/R4/R5/R6)

if envelope["base64"] == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}

envelope["base64"] is typed as any. The == comparison against the string "" compares the interface directly to a string literal — this evaluates to false (not equal) whenever the map value is a non-nil string, so a corrupt base64 payload still passes. A full round-trip decode:

payload, ok := envelope["base64"].(string)
if !ok || payload == "" {
    t.Fatalf("base64 envelope missing payload: %v", envelope)
}
decoded, err := base64.StdEncoding.DecodeString(payload)
if err != nil {
    t.Fatalf("invalid base64 payload %q: %v", payload, err)
}
if !bytes.Equal(decoded, []byte{0x80, 0xff, 0x01}) {
    t.Fatalf("decoded member = %v, want [0x80 0xff 0x01]", decoded)
}

Add "bytes" and "encoding/base64" to the import block.

Fix this →

Nits (carried from prior rounds, still unaddressed)

//nolint:gosec suppressions (lines 50, 67, 77, 90, 102 in redis_zset_test.go) — per CLAUDE.md: "Avoid //nolint — refactor instead." The five uint32(len(userKey)) casts in encodeZSetMetaValue, zsetMetaKey, zsetMemberKey, zsetScoreKey, zsetMetaDeltaKey can be made obviously safe with a bounds panic (or a putUserKeyLen helper as CodeRabbit proposed), eliminating all five suppressions.

TestRedisDB_ZSetLegacyBlobRejectsMalformedKey contradictory comments (lines 837–840) — the test outer docstring says "key with no trailing user-key bytes fails parse" but the inner comment says "empty user key is technically valid Redis — use a key without the prefix to trigger failure." The test uses "not-the-right-prefix|k" (no prefix match), not an empty user-key. Rename to TestRedisDB_ZSetLegacyBlobRejectsMissingPrefix and remove the contradictory sentence.

Per-record []byte() allocations — []byte(RedisZSetMetaDeltaPrefix), []byte(RedisZSetMetaPrefix), []byte(RedisZSetMemberPrefix) at lines 132, 361, 364, 376 create one allocation per dispatched record in the snapshot hot path. Package-level var hoists (as CodeRabbit proposed) eliminate all four.

Confirmed correct across all 7 commits

Byte-budget accounting is exact: parkUnknownTTL charges and claimPendingTTL refunds the same len(key) + 8 cost ✓
pendingTTLBytes drains to 0 after Finalize processes all typed records (verified by TestRedisDB_PendingTTLByteBudgetReclaimedOnClaim) ✓
WithPendingTTLByteCap(0) → counter-only, no error; negative → coerced to 0 ✓
Finalize warning carries pending_ttl_buffer_bytes_cap and pending_ttl_buffer_overflow when overflow occurred ✓
sawWide + markZSetWide() mirror adapter/redis_compat_helpers.go:610-620 exactly ✓
NaN fail-closed in both HandleZSetMember and decodeZSetLegacyBlobValue ✓
redisZSetLegacyProtoPrefix = []byte{0x00, 'R', 'Z', 0x01} matches adapter/redis_storage_codec.go:15 ✓

Summary

Commit 7's byte-budget approach is architecturally correct and well-tested — it closes the OOM vector that an entry-count cap left open for snapshots with large keys. The five open issues all carry from prior rounds: the most impactful remaining gap is the wrong scan-order claim in HandleZSetLegacyBlob's docstring (and the matching test comment), plus the missing test for the real-world TTL-first ordering on legacy blobs. The duplicate-key-at-cap false error in parkUnknownTTL is unreachable in healthy snapshots but worth fixing for correctness. The base64 assertion and the //nolint nits are straightforward one-line fixes.

…tries Mirror of PR #790 round 6 codex P1 fix. The entry-count cap from r3/r4 doesn't actually bound memory (Redis user keys up to 1 MiB each → entry-count cap permits N × 1 MiB of buffered key bytes). Fix: replace entry-count cap with a byte budget (pendingTTLBytesCap, default 64 MiB). claimPendingTTL decrements the byte counter so a long-running scan that drains as it goes can reuse the budget. Renames (no external callers existed — symbols introduced in r3/r4 of this PR): - WithPendingTTLCap → WithPendingTTLByteCap - pendingTTLCap → pendingTTLBytesCap - defaultPendingTTLCap → defaultPendingTTLBytesCap (64 MiB) - Added pendingTTLBytes counter - Added pendingTTLEntryOverheadBytes (= 8) Caller audit (per /loop standing instruction): WithPendingTTLCap had zero external callers in this branch's tree (`grep -rn 'WithPendingTTLCap|pendingTTLCap\b' --include=*.go` returns nothing post-rename). parkUnknownTTL keeps its error return signature; only the comparison shifts from entry count to byte budget. claimPendingTTL gains a counter-decrement matching parkUnknownTTL's add side. Tests renamed/added (same as PR #790 r6): - TestRedisDB_PendingTTLFailsClosedAtByteCap (renamed) - TestRedisDB_PendingTTLByteCapBoundedByLargeKey (new) - TestRedisDB_PendingTTLByteBudgetReclaimedOnClaim (new) - TestRedisDB_WithPendingTTLByteCapZeroOpts (renamed) - TestRedisDB_WithPendingTTLByteCapNegativeCoercedToZero (renamed) Self-review: 1. Data loss - byte budget is a strictly tighter memory bound than the previous entry-count cap. 2. Concurrency - no new shared state. 3. Performance - one int comparison + add per parkUnknownTTL; one subtract per claimPendingTTL. 4. Consistency - matches PR #790 r6 exactly; the two PRs share the pendingTTL infrastructure and the fix is identical. 5. Coverage - 5 tests (3 new, 2 renamed). All pass with -race.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8eca5cdf88

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T20:15:33Z

+	r.pendingTTL[string(userKey)] = expireAtMs
+	r.pendingTTLBytes += cost


Avoid double-counting bytes when buffering duplicate TTL keys

When parkUnknownTTL receives the same userKey more than once before that key is claimed, the map write overwrites the previous TTL value but pendingTTLBytes is incremented on every call. This inflates the byte budget accounting and can trigger ErrPendingTTLBufferFull even though the actual buffered map size has not grown, causing unnecessary backup failures on inputs that replay or duplicate !redis|ttl| records.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

bootjp mentioned this pull request May 19, 2026

backup: Redis stream encoder (Phase 0a) #791

Open

4 tasks

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

		r.pendingTTL[string(userKey)] = expireAtMs
		r.pendingTTLBytes += cost

Conversation

bootjp commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Self-review (5 lenses)

Caller audit (per /loop standing instruction)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

bootjp commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #790 — Redis zset encoder (Phase 0a)

Wire format

Issues

Medium — TTL scan-order gap not tested (functional risk)

Low — marshalRedisZSetScore: no unreachable guard for NaN

Confirmed correct

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #790 — Redis zset encoder (Phase 0a)

P1 Fix — pendingTTL buffering (commit 2)

Wire format and codec

Confirmed correct — design decisions

Issues

Low — Base64 test assertion is shallow (redis_zset_test.go:314-316)

Nit — //nolint:gosec in test key-builder helpers

Nit — Per-record []byte(prefix) allocations (CodeRabbit)

Future work (not blocking)

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #790 — Redis zset encoder (Phase 0a) — Round 3

Commit 3: HandleZSetLegacyBlob — correctness

Medium — Incorrect scan-order claim; wrong test order for legacy-blob TTL

bootjp commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Caller audit (per `/loop` standing instruction)

coderabbitai Bot commented May 19, 2026 •

edited

Loading

claude Bot commented May 19, 2026 •

edited

Loading

Low — `marshalRedisZSetScore`: no unreachable guard for NaN

claude Bot commented May 19, 2026 •

edited

Loading

P1 Fix — `pendingTTL` buffering (commit 2)

Low — Base64 test assertion is shallow (`redis_zset_test.go:314-316`)

Nit — `//nolint:gosec` in test key-builder helpers

Nit — Per-record `[]byte(prefix)` allocations (CodeRabbit)

claude Bot commented May 19, 2026 •

edited

Loading

Commit 3: `HandleZSetLegacyBlob` — correctness

Low — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments and tests the wrong boundary

claude Bot commented May 19, 2026 •

edited

Loading

Commit 4 (`ab88f3e`) — Wide-column supersedes legacy blob

Medium — `HandleZSetLegacyBlob` docstring states the wrong scan order for TTL (lines 252–257)

Low — `TestRedisDB_ZSetLegacyBlobWithInlineTTL` tests the non-real-world scan order

Low — Base64 assertion is non-functional (`redis_zset_test.go:346-348`, carried from rounds 2–3)

Nit — `//nolint:gosec` suppressions in test helpers (lines 50, 67, 77, 90, 102; carried from rounds 2–3)

Nit — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments (line 838–841; carried from round 3)

claude Bot commented May 19, 2026 •

edited

Loading

Commit 5 (`174456a`) — `pendingTTL` OOM bound

Low — `parkUnknownTTL`: duplicate-key at-cap edge case

Medium — `HandleZSetLegacyBlob` docstring states wrong scan order for TTL

Low — `TestRedisDB_ZSetLegacyBlobWithInlineTTL` tests non-real-world scan order

Low — Base64 assertion is non-functional (`redis_zset_test.go:346–348`)

Nit — `//nolint:gosec` in test helpers (lines 50, 67, 77, 90, 102)

Nit — `TestRedisDB_ZSetLegacyBlobRejectsMalformedKey` has contradictory comments

claude Bot commented May 19, 2026 •

edited

Loading

Commit 6 (`800cc8c`) — fail-closed on pendingTTL overflow