backup: Redis simple-type encoders (strings, HLL, TTL routing)#713
backup: Redis simple-type encoders (strings, HLL, TTL routing)#713
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCross-platform disk-full detection and safe sidecar file opening are added; Redis backup handler is refactored to use per-database output directories, deduplicate inline TTLs, count orphan TTLs, lazily emit KEYMAP sidecars for SHA-fallbacks, and harden directory creation with symlink/hard-link/FIFO protections. ChangesRedis Backup Safety and Per-DB Support
Sequence DiagramsequenceDiagram
participant Caller as Redis Backup<br/>Caller
participant Handler as RedisDB<br/>Handler
participant TTLMgr as TTL Router
participant FileSys as Safe File Ops
participant Disk as Disk Health
participant KM as KEYMAP Writer
Caller->>Handler: HandleString(key,value,ttl)
Handler->>Handler: mark inlineTTLEmitted[key]=true
Handler->>FileSys: writeBlob(key,data) -> ensureDir(per-DB)
FileSys->>FileSys: revalidate ancestors for symlinks
FileSys->>Disk: write blob data
Disk-->>FileSys: disk-full status via isDiskFullError
alt String had inline TTL
Handler->>TTLMgr: skip TTL sidecar write
else needs sidecar TTL
Handler->>FileSys: appendTTL(strings_ttl.jsonl)
FileSys->>FileSys: openSidecarFile(path) (reject symlinks/hardlinks/FIFO)
FileSys->>Disk: write TTL entry
end
Handler->>Handler: HandleTTL(key,expiry)
Handler->>TTLMgr: if inlineTTLEmitted skip else increment orphanTTLCount
Handler->>KM: EncodeSegment(key)
alt SHA-fallback
KM->>FileSys: recordIfFallback(KEYMAP.jsonl)
FileSys->>FileSys: openSidecarFile(KEYMAP)
FileSys->>Disk: write keymap entry
end
Caller->>Handler: Finalize()
Handler->>KM: closeKeymap()
Handler->>Handler: report orphanTTLCount
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces Redis simple-type encoders for strings, HyperLogLogs, and TTL records, featuring atomic file operations and sidecar file management. The review feedback identifies several improvement opportunities: implementing bounds on memory usage for pending TTL records to prevent OOM, parameterizing hardcoded database indices to align with documentation, optimizing directory creation to reduce redundant system calls, and expanding retriable error checks to include disk-full scenarios.
| r.pendingWideColumnTTL = append(r.pendingWideColumnTTL, redisTTLPending{ | ||
| UserKey: bytes.Clone(userKey), | ||
| ExpireAtMs: expireAtMs, | ||
| }) |
There was a problem hiding this comment.
The pendingWideColumnTTL slice grows unbounded as it accumulates TTL records for keys not yet claimed by string or HLL handlers. To prevent potential OOM issues, apply a fixed bound to this collection and reject new requests when the bound is reached, as per the repository's memory management guidelines.
References
- To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests. Reject new requests when the bound is reached.
|
|
||
| func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error { | ||
| encoded := EncodeSegment(userKey) | ||
| dir := filepath.Join(r.outRoot, "redis", "db_0", subdir) |
There was a problem hiding this comment.
The database index db_0 is hardcoded here and in appendTTL (line 230). This contradicts the documentation stating that RedisDB encodes one logical database and that the caller is responsible for choosing the index. This hardcoding will cause collisions if multiple RedisDB instances are used for different databases within the same outRoot. Consider passing the database index as a parameter to NewRedisDB and using it to construct the path.
| func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error { | ||
| encoded := EncodeSegment(userKey) | ||
| dir := filepath.Join(r.outRoot, "redis", "db_0", subdir) | ||
| if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode |
There was a problem hiding this comment.
Calling os.MkdirAll for every blob write is inefficient as it performs multiple system calls to verify the directory hierarchy. Since the subdirectories ('strings', 'hll') are fixed for a given RedisDB instance, consider caching the fact that they have been created (e.g., using a map[string]struct{}) to avoid redundant I/O on the hot path.
| } | ||
| // errors.Is handles wrapped paths; both sentinel checks are stable | ||
| // for now because we never wrap them ourselves. | ||
| return errors.Is(err, io.ErrShortWrite) |
There was a problem hiding this comment.
Consider including syscall.ENOSPC (disk full) in the check for retriable errors. While io.ErrShortWrite is checked, os.File.Write typically returns a concrete system error like ENOSPC when the disk is full, which is a common scenario for backup operations that might be retriable after freeing up space.
…est' into feat/backup-phase0a-redis-simple
All four Gemini findings addressed. #189 -- pendingWideColumnTTL slice bounded. The orphan-TTL buffer is now capped at maxPendingWideColumnTTL = 1,000,000 entries (~50 MiB). Records past the cap are dropped and the warn sink reports the count at Finalize. Real production state (where wide-column type encoders eventually claim every TTL) is far under the cap; the bound only protects against malformed or adversarial snapshots. #217 -- db_0 hardcoding fixed. NewRedisDB now takes a dbIndex parameter; the per-encoder root "<outRoot>/redis/db_<idx>/" is computed from it. Two encoders with the same outRoot but different indices no longer collide. TestRedisDB_PerDBIndexRoutesIntoOwnDirectory locks in the distinction. #218 -- MkdirAll cached. Added dirsCreated map[string]struct{} on RedisDB. ensureDir() checks the map before MkdirAll so repeated writes (one per blob record) collapse to a map lookup. For a 10M-key dump this saves ~10M stat+mkdir(EEXIST) round-trips. TestRedisDB_DirsCreatedCachesMkdirAll asserts the cache is populated exactly once per directory. #382 -- ENOSPC handling. Added IsBlobAtomicWriteOutOfSpace as the explicit ENOSPC probe. IsBlobAtomicWriteRetriable continues to report only io.ErrShortWrite as retriable -- ENOSPC is intentionally NOT retriable (a backup against a full disk should surface to the operator rather than spin). The two-function split lets the master pipeline render the right alarm message.
|
All four Gemini MEDIUM items addressed in #189 — #217 — Hardcoded #218 — #382 — ENOSPC @claude review |
|
Claude finished @bootjp's task in 3m 18s —— View job Review of PR #713 (after
|
| Item | Verdict |
|---|---|
| #189 pendingWideColumnTTL OOM | ✅ Cap implemented (maxPendingWideColumnTTL = 1_000_000); documented rationale is solid |
#217 hardcoded db_0 |
✅ NewRedisDB(outRoot, dbIndex int) + per-index path + collision test |
| #218 MkdirAll per blob | ✅ dirsCreated map[string]struct{} + ensureDir short-circuit + cache test |
| #382 ENOSPC | ✅ Separated into IsBlobAtomicWriteOutOfSpace (not retriable, explicit probe) — the two-function split and the "why not retriable" comment are well-reasoned |
The constant cross-check against adapter/redis_compat_types.go also confirms the re-defined magic bytes are byte-for-byte identical to the live encoder.
Additional Findings
Medium
1. Dropped-TTL count in Finalize warning is inaccurate when the cap is hit (redis_string.go:242)
r.warn("redis_orphan_ttl",
"count", len(r.pendingWideColumnTTL), // always ≤ 1_000_000
...When 2M orphan TTLs arrive, the warning reports count: 1000000, silently hiding that another 1M were dropped. An operator reading that log would assume exactly 1M orphans are pending — not 2M. A droppedWideColumnTTL int counter incremented inside the else branch of HandleTTL's cap check would let Finalize report "dropped", r.droppedWideColumnTTL separately.
2. TestRedisDB_PendingWideColumnTTLBounded doesn't exercise the cap boundary (redis_string_test.go:312)
The test drives 10,000 entries (1% of the 1M cap) and asserts they all landed, then explicitly skips testing the drop behavior: "a 1M-record stress test in CI would be wasteful." But the critical path — the if len < maxPendingWideColumnTTL branch that drops entries — is never executed by any test. This means the cap check is untested dead code from coverage's perspective.
A straightforward fix: add a package-internal helper (or export a WithMaxPendingTTL(n int) option) that lets tests inject a lower cap, then verify that entry n+1 is dropped and len(db.pendingWideColumnTTL) == n. Alternatively, the test could drive exactly maxPendingWideColumnTTL + 1 entries in a goroutine-parallel fashion if 1M+1 iterations are too slow for CI.
Low / Nits
3. writeFileAtomic lacks fsync (redis_string.go:430)
tmp.Write → tmp.Close() → os.Rename() is not crash-durable on POSIX without an fsync between write and close. On a power failure after the rename but before the kernel flushes the page, the renamed file can contain zeros. For an offline rerunnable backup tool this may be an acceptable tradeoff (simply rerun), but it should be a deliberate, documented choice, not accidental. A //nolint:wrapcheck comment noting "no fsync by design — backup is idempotent; rerun on crash" costs one line.
4. Second closeJSONL error silently swallowed in Finalize (redis_string.go:236)
if err := closeJSONL(r.stringsTTL); err != nil && firstErr == nil {
firstErr = err
}
if err := closeJSONL(r.hllTTL); err != nil && firstErr == nil {
firstErr = err
}If both flush/close fail (possible on NFS under heavy I/O), only the strings error is returned. The HLL sidecar error is silently swallowed and the caller has no way to know its TTL sidecar is corrupt/incomplete. Consider errors.Join or at minimum a second warn-sink call for the secondary error.
5. Ordering assumption undocumented in HandleTTL (redis_string.go:203)
The PR description explains that Pebble lex order guarantees hll < str < ttl, so HandleTTL is always called after the corresponding HandleString/HandleHLL. But this invariant is not stated in the doc comment for HandleTTL itself. A caller who doesn't read the PR would have no warning that out-of-order calls silently promote the TTL to pendingWideColumnTTL. Worth one sentence in the doc comment: "Callers must have already dispatched the corresponding Handle{String,HLL} record — guaranteed by Pebble key order; see PR description."
6. Double-clamping of dbIndex (redis_string.go:148 and redis_string.go:258)
NewRedisDB already clamps dbIndex < 0 to 0, and redisDBSegment also clamps it again. The second clamp is unreachable for any caller going through NewRedisDB. It's harmless but creates a confusing impression that dbIndex might escape the clamp somewhere.
Pass Summaries
- Data loss — Atomic write (tmp+rename) is sound; symlink guard is correct; deferred tmp cleanup is correct. The missing
fsyncis the one unresolved gap (see Update docker/setup-qemu-action action to v3 #3 above). - Concurrency — Package is intentionally single-threaded; documented and appropriate.
- Performance —
ensureDircaching eliminates hot-pathstatcalls;kindByKeymap allocation is proportional and bounded;intToDecimalavoidsstrconvimport on the hot path. No regressions. - Data consistency — Magic-byte constants verified byte-for-byte against
adapter/redis_compat_types.go;math.MaxInt64clamp mirrors the live decoder; TTL sidecar writes mirror the snapshot record ordering contract. - Test coverage — 15 tests, all passing per CI. Coverage gap: the cap-boundary drop path in
HandleTTLis exercised by zero tests (see Update dependency go to v1.21.4 #2 above).
Overall: The four Gemini items are cleanly resolved; the code is well-structured and safe for its stated scope. Items #1 and #2 are worth a follow-up — the dropped-TTL count accuracy and the untested cap-boundary path are the two gaps that could mislead an operator or hide a future regression in the cap logic.
| Branch
…est' into feat/backup-phase0a-redis-simple
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84a1c5a3e0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case redisKindString: | ||
| return r.appendTTL(&r.stringsTTL, redisStringsTTLFile, userKey, expireAtMs) |
There was a problem hiding this comment.
Avoid duplicate TTL entries for new-format string keys
This redisKindString TTL routing appends every !redis|ttl| record to strings_ttl.jsonl, but HandleString already extracts inline TTL from new-format string values and writes that same key to the same sidecar. In current Redis writes, string commits emit both encoded string TTL and a !redis|ttl| scan-index entry (see adapter/redis_lua_context.go stringCommitElems), so expiring string keys will be duplicated in the sidecar, violating one-record-per-key assumptions and creating ambiguous restore input.
Useful? React with 👍 / 👎.
| if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode | ||
| return nil, cockroachdberr.WithStack(err) | ||
| } | ||
| f, err := os.Create(path) //nolint:gosec // path is composed from output-root + fixed file name |
There was a problem hiding this comment.
Reject symlink targets when opening TTL sidecar files
Creating TTL sidecars with os.Create(path) follows symlinks, so if strings_ttl.jsonl or hll_ttl.jsonl is a symlink in the output tree, backup generation will truncate and overwrite the symlink target outside the intended directory. Blob writes already defend against this (writeFileAtomic refuses symlink overwrites), so sidecar creation currently bypasses the same safety boundary.
Useful? React with 👍 / 👎.
Codex round 5 raised two correctness issues: 1. P1: !redis|ttl| scan-index records duplicated TTL for new-format strings. The live Redis encoder writes both !redis|str|<k> (with TTL embedded inline in the magic-prefix header) and the scan-index !redis|ttl|<k> for every expiring string (see adapter/redis_lua_context.go stringCommitElems). HandleString wrote strings_ttl.jsonl from the inline TTL; HandleTTL then routed the redundant !redis|ttl| record to the same sidecar, violating the one-record-per-key contract sidecar consumers rely on. Track inline-emitted keys in a new inlineTTLEmitted set and drop the scan-index TTL when the same key was already covered by the inline header. Legacy (no-magic-prefix) strings still flow through the !redis|ttl| path because their TTL only lives there. 2. P2: openJSONL followed symlinks. writeFileAtomic already refuses to clobber a symlink at a blob path; openJSONL did not. A strings_ttl.jsonl or hll_ttl.jsonl symlink in the output tree would otherwise be followed by os.Create, truncating a target outside the dump tree. Mirror the writeFileAtomic Lstat guard. Tests: TestRedisDB_NewFormatStringTTLNotDuplicatedByScanIndex, TestRedisDB_OpenJSONLRefusesSymlinkOverwrite.
|
@claude review Round 2 fixes for Codex round 5 on commit 84a1c5a:
Tests: |
|
Claude finished @bootjp's task in 3m 0s —— View job Review of PR #713 — Round 2 (commit
|
| Case | inlineTTLEmitted set? |
HandleTTL behaviour |
|---|---|---|
| New-format string WITH inline TTL | ✅ Yes, in HandleString |
Skips appendTTL, drops redundant scan-index record |
| New-format string WITHOUT TTL | No (early return at expireAtMs==0) |
No !redis|ttl| record exists in snapshot — nothing to call |
| Legacy string (no magic prefix) | No (decodeRedisStringValue returns expireAtMs=0) |
Falls through to appendTTL — correct, TTL lives only in the scan-index |
The one-record-per-key invariant is preserved for all three cases. TestRedisDB_NewFormatStringTTLNotDuplicatedByScanIndex asserts exactly 1 record in the sidecar after a HandleString + HandleTTL sequence. ✓
P2 — openJSONL now refuses symlinks (redis_string.go:408)
The Lstat-before-Create guard mirrors writeFileAtomic exactly. Both have the same TOCTOU window, but for an offline backup tool running in a controlled output directory, this is an accepted and consistent tradeoff. TestRedisDB_OpenJSONLRefusesSymlinkOverwrite verifies the bait file is untouched and the error message is correct. ✓
Pass Summaries
-
Data loss — Atomic write semantics unchanged; the new
inlineTTLEmittedmap is write-only within a key's lifetime and does not affect flush/close paths. No new data-loss vectors. The prior noted issues (missingfsync, secondcloseJSONLerror swallowed inFinalize) remain but are pre-existing nits. -
Concurrency — Single-threaded by design and documented;
inlineTTLEmittedaccess is sequential. No issues. -
Performance —
inlineTTLEmittedgrows proportional to expiring strings, which is inherent and bounded by the actual keyspace. The map lookup inHandleTTLis O(1) on the already-hot path. No regressions. -
Data consistency — The deduplication logic is correct for all three string formats. Legacy strings correctly bypass
inlineTTLEmittedsincedecodeRedisStringValuereturnsexpireAtMs=0for non-magic-prefix values, leavinginlineTTLEmittedunpopulated. ✓ -
Test coverage — 17 tests now. Both regression tests exercise the correct boundary: P1 tests the sidecar count after a real duplicate sequence; P2 confirms the bait file is untouched and the error message matches.
Residual Items (Carried from Round 1 — Not Blocking)
Two medium items from the previous review pass remain open:
1. Dropped-TTL count understated when cap is hit (redis_string.go:250)
When the pendingWideColumnTTL cap (1_000_000) is hit, Finalize warns with "count", len(r.pendingWideColumnTTL) — which is always ≤ 1_000_000. If 2M orphan TTLs arrived, the operator sees count: 1000000 with no indication that another 1M were silently dropped. A droppedWideColumnTTL int counter (incremented in the else branch of the cap check) would let Finalize report "dropped", r.droppedWideColumnTTL separately.
2. TestRedisDB_PendingWideColumnTTLBounded does not exercise the drop path (redis_string_test.go)
The test drives 10K entries (1% of the 1M cap) and verifies they all land. The else branch (the actual drop) is executed by zero tests, so it's untested dead code from coverage's perspective. Adding a package-level var maxPendingWideColumnTTL = 1_000_000 instead of a const (or a withMaxPendingTTL option on RedisDB) would let tests inject a tiny cap and verify entry n+1 is dropped and len(db.pendingWideColumnTTL) == n.
Overall: The two Codex round-5 issues are cleanly resolved with correct logic and matching regression tests. Ready to merge pending the two medium carry-overs above (or deferring them to a follow-up alongside the wide-column encoders, when the cap becomes load-bearing).
| Branch
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c699c15ca5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if info, err := os.Lstat(path); err == nil && info.Mode()&os.ModeSymlink != 0 { | ||
| return nil, cockroachdberr.WithStack(cockroachdberr.Newf("backup: refusing to overwrite symlink at %s", path)) | ||
| } | ||
| f, err := os.Create(path) //nolint:gosec // path is composed from output-root + fixed file name |
There was a problem hiding this comment.
Eliminate symlink race when creating TTL sidecars
The os.Lstat check is separated from os.Create, so a process that can write the output directory can swap strings_ttl.jsonl/hll_ttl.jsonl to a symlink between those calls and still get the symlink target truncated. This reintroduces the overwrite primitive the guard is trying to prevent in concurrent or adversarial local environments; the open needs to be atomic with no-follow semantics (or written via temp+rename) to actually enforce the boundary.
Useful? React with 👍 / 👎.
| // the warning sink at Finalize. | ||
| if len(r.pendingWideColumnTTL) < maxPendingWideColumnTTL { | ||
| r.pendingWideColumnTTL = append(r.pendingWideColumnTTL, redisTTLPending{ | ||
| UserKey: bytes.Clone(userKey), |
There was a problem hiding this comment.
Avoid buffering full orphan TTL keys before Finalize
HandleTTL clones every unknown-key TTL into pendingWideColumnTTL, but Finalize only reads len(...) for warning output and never consumes the stored key/expiry payloads. On snapshots with many unsupported wide-column TTLs (especially with large user keys), this can allocate large, unnecessary memory and defeat the intended OOM protection despite the record-count cap.
Useful? React with 👍 / 👎.
…est' into feat/backup-phase0a-redis-simple
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c70e83da4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error { | ||
| encoded := EncodeSegment(userKey) |
There was a problem hiding this comment.
Emit keymap entries for SHA-fallback Redis keys
writeBlob stores filenames from EncodeSegment(userKey), but this code never records KEYMAP entries when EncodeSegment takes its SHA-fallback path for long keys. For Redis keys whose encoded segment is non-reversible (>240-byte or promoted fallback cases), restore tooling cannot recover the original key bytes from *.bin/TTL metadata alone, causing irreversible key-name loss for affected records.
Useful? React with 👍 / 👎.
| if !isNewRedisStrFormat(value) { | ||
| return value, 0, nil | ||
| } |
There was a problem hiding this comment.
Stop misclassifying FF01-prefixed legacy strings
The new-format probe only checks the first two bytes (0xFF,0x01), so any legacy raw string value that happens to start with that prefix is parsed as encoded format and has bytes stripped (or can trigger a false TTL parse). This silently corrupts restored values for binary-safe legacy Redis strings matching that prefix pattern; the decoder needs a stricter discriminator than two-byte magic alone.
Useful? React with 👍 / 👎.
Codex round 6 raised two issues on commit 1c70e83: 1. P1: openJSONL had a TOCTOU window between Lstat and os.Create. A process that can write the output directory could swap the sidecar path to a symlink between the two syscalls and still get the target truncated, defeating the round-5 guard. Replace the two-step pattern with a single os.OpenFile call carrying O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW so the open itself refuses symlinks atomically; on Linux/macOS/BSD this returns ELOOP, on Windows O_NOFOLLOW is 0 (matches the different filesystem-permission model). Wrap the syscall error with the same "refusing to overwrite symlink" message so the existing test's substring assertion still holds. 2. P2: HandleTTL was buffering each orphan TTL record's full user key into pendingWideColumnTTL, but Finalize only ever read len(...) for the warning sink. With user keys up to 1 MiB each (store/mvcc_store.go maxSnapshotKeySize), this could allocate gigabytes for no benefit. Replace the slice with a plain `orphanTTLCount int`. The wide-column-encoder PR will re-introduce per-key tracking when those encoders actually consume the keys. Tests: - TestRedisDB_OpenJSONLRefusesSymlinkOverwrite — comment updated to reflect the atomic open semantics. - TestRedisDB_OrphanTTLCountedNotBuffered — replaces TestRedisDB_PendingWideColumnTTLBounded; asserts the count field instead of slice length.
|
@claude review Round 3 fixes for Codex round 6 on commit 1c70e83:
|
|
Claude finished @bootjp's task in 4m 1s —— View job Review of PR #713 — Round 3 (commit
|
…nd 4) Codex P1 round 7 (commit d965b65): `writeBlob` and `appendTTL` were storing user keys as `EncodeSegment(userKey)` filenames / JSONL row keys, but never recording a KEYMAP.jsonl entry when EncodeSegment took its SHA-fallback path. For Redis keys longer than the 240-byte ceiling (or those whose percent-encoded form was promoted to the fallback to disambiguate from the binary `b64.` prefix or the SHA-fallback shape), the encoded segment is non-reversible — so without a KEYMAP record, the original Redis user key bytes were lost in the backup output. Wire a lazy-opened, atomic-no-follow `KEYMAP.jsonl` writer into `RedisDB`. Both `writeBlob` and `appendTTL` now call `recordIfFallback(encoded, userKey)` which writes a `KindSHAFallback` record on first encounter. `closeKeymap` (called from Finalize) flushes + closes the file; if no fallback records were ever emitted, the empty file is removed so dumps without any non-reversible keys carry no spurious file (matches the s3 encoder's policy). Tests: - TestRedisDB_SHAFallbackKeymapped — drives a 300-byte key, asserts KEYMAP.jsonl exists and round-trips back to the original bytes. - TestRedisDB_NoKeymapWhenAllReversible — short-key dump produces no KEYMAP file.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d965b6598c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // kernel supports it (Linux, macOS, BSD) this returns ELOOP for | ||
| // a symlink path. On Windows syscall.O_NOFOLLOW is 0 (no-op), | ||
| // matching Windows's different filesystem-permission model. | ||
| const flag = os.O_WRONLY | os.O_CREATE | os.O_TRUNC | syscall.O_NOFOLLOW |
There was a problem hiding this comment.
Guard O_NOFOLLOW usage behind Unix-specific build logic
openJSONL is compiled for all targets but unconditionally references syscall.O_NOFOLLOW; this symbol is not defined on Windows, so the package no longer builds there (reproduced with GOOS=windows GOARCH=amd64 go test ./internal/backup -run TestDoesNotExist, which fails at this line). This introduces a cross-platform build regression and conflicts with the surrounding comment that implies Windows compatibility.
Useful? React with 👍 / 👎.
|
@claude review Round 4 fixes for Codex round 7 on commit d965b65:
|
|
Claude finished @bootjp's task in 4m 21s —— View job Review of PR #713 — Round 4 (commit
|
| Path | recordIfFallback called |
KEYMAP record emitted |
|---|---|---|
| Short key, no TTL | ✓ (IsShaFallback → false, returns nil) | No |
| Short key, with TTL | ✓ (IsShaFallback → false, returns nil) | No |
| Long key, no TTL | ✓ from writeBlob only |
1 record |
| Long key, with inline TTL | ✓ from writeBlob + appendTTL (via HandleString) |
2 records (same mapping) |
| HLL long key + TTL | ✓ from writeBlob + appendTTL (via HandleTTL) |
2 records (same mapping) |
The lazy-open with O_NOFOLLOW mirrors openJSONL exactly. closeKeymap's empty-file removal is correct — r.keymap.Count() is only 0 when every WriteOriginal call failed (which means the caller got an error and aborted before Finalize). LoadKeymap's "last record wins" semantics are verified: out[rec.Encoded] = rec — duplicate records with the same key produce the same mapping. ✓
P1 — FF01 ambiguity (deferred): ✓ Rationale is sound
The argument holds: the backup decoder must produce bit-for-bit identical output to what the live cluster returns for a given stored value. Changing the two-byte discriminator in the backup decoder without changing the live encoder would cause backup→restore round-trips to diverge from the live cluster's behavior — exactly the wrong invariant. A coordinated wire-format change is the right path, and a follow-up design doc is the right vehicle.
P2 round 7 — O_NOFOLLOW Windows build (not addressed in Round 4)
Codex raised that syscall.O_NOFOLLOW may not be defined on Windows and that openJSONL would break Windows builds. The Round 3 review accepted the existing openJSONL pattern based on the comment "On Windows syscall.O_NOFOLLOW is 0 (no-op)." Round 4 adds recordIfFallback with the identical pattern. This comment needs verification — if syscall.O_NOFOLLOW is genuinely absent on Windows rather than just 0, both openJSONL and recordIfFallback would fail to compile there. A one-line cross-compile check (GOOS=windows GOARCH=amd64 go build ./internal/backup/) would settle it. If the symbol is absent, the standard fix is a build-tagged shim (backup_nofollow_unix.go + backup_nofollow_other.go) or using golang.org/x/sys/unix.O_NOFOLLOW behind a build tag.
Additional Findings
Low
1. Duplicate KEYMAP records for expiring SHA-fallback keys (redis_string.go:317, redis_string.go:340)
For any expiring SHA-fallback string key, HandleString calls writeBlob → recordIfFallback (record #1) then appendTTL → recordIfFallback (record #2). Both records carry identical encoded and original bytes, so LoadKeymap sees them as the same entry. Correctness is not affected, but KEYMAP.jsonl is 2× larger than necessary for such keys. For a dump with 1M expiring long-key strings this doubles the KEYMAP file size.
A seenFallbacks map[string]struct{} on RedisDB (checked inside recordIfFallback before calling WriteOriginal) would eliminate the duplicates. Not blocking — just a space inefficiency.
2. closeKeymap swallows the keymapFile.Close() error when flush also fails (redis_string.go:393)
flushErr := r.keymap.Close()
closeErr := r.keymapFile.Close()
if flushErr == nil && closeErr != nil {
flushErr = closeErr
}If both flush and close fail (e.g., NFS under heavy I/O), only flushErr is returned. This is consistent with the pre-existing closeJSONL first-error pattern carried through the whole file, so it's not a new regression — just noting it for completeness.
Pass Summaries
- Data loss —
recordIfFallbackcorrectly emits KEYMAP records for both blob and TTL paths.O_TRUNCon lazy open is correct (fresh dump semantics).closeKeymap's empty-file removal is safe. No new data-loss vectors. - Concurrency — Single-threaded by design.
recordIfFallbacklazy-init ofr.keymapis sequential. No issues. - Performance —
IsShaFallbackis O(34) string prefix check on every blob/TTL write — negligible. The duplicate-record concern (finding Configure Renovate #1) is a space, not time, cost. - Data consistency —
LoadKeymaplast-record-wins confirmed atkeymap.go:238. Duplicate records for expiring fallback keys produce the same mapping.O_TRUNCensures a clean KEYMAP for each dump session. ✓ - Test coverage — 19 tests total.
TestRedisDB_SHAFallbackKeymappedexercises bothwriteBlobandappendTTLpaths becauseHandleStringwith inline TTL calls both. The round-trip viaLoadKeymap+rec.Original()is thorough.TestRedisDB_NoKeymapWhenAllReversiblecovers the no-fallback path. ✓
Overall: The SHA-fallback KEYMAP fix is cleanly implemented and its tests are solid. The FF01 deferral is well-argued. The only open item from Codex round 7 is the Windows build question — a quick GOOS=windows go build verifies or refutes it with one command. If it compiles, close the item; if not, a two-file build-tag shim resolves it in minutes.
| Branch
…d 5) Codex P2 round 7 (commit d965b65): `syscall.O_NOFOLLOW` is not defined on Windows, so the round-3 fix that used it directly broke the cross-platform build (`GOOS=windows GOARCH=amd64 go build ./internal/backup` failed at `undefined: syscall.O_NOFOLLOW`). Extract the platform-specific open into a small build-tagged helper: - open_nofollow_unix.go (`//go:build unix`) keeps the atomic O_NOFOLLOW open with ELOOP wrapping. - open_nofollow_windows.go (`//go:build windows`) uses Lstat then OpenFile. The residual TOCTOU window is acceptable on Windows because mounting a successful attack on the dump tree there already requires write access to the output directory plus SeCreateSymbolicLinkPrivilege — a much higher bar than the unix case the round-3 fix was hardening against. Both `openJSONL` and the keymap-creation path in `recordIfFallback` now go through `openSidecarFile`, eliminating the duplicate inline OpenFile blocks. Verified: `go test -race` passes on Linux/macOS; `GOOS=windows GOARCH=amd64 go build ./internal/backup/...` now succeeds (was the Codex repro).
|
@claude review Round 5 fix for Codex P2 round 7 on commit d965b65 (commit
Extracted the per-platform open into a build-tagged helper
Both |
|
Claude finished @bootjp's task in 3m 26s —— View job Review of PR #713 — Round 5 (commit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f35446404
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const flag = os.O_WRONLY | os.O_CREATE | os.O_TRUNC | syscall.O_NOFOLLOW | ||
| f, err := os.OpenFile(path, flag, 0o600) //nolint:gosec,mnd // path is composed from output-root + fixed file name; 0600 is the standard owner-only mode |
There was a problem hiding this comment.
Refuse hard-link targets when creating sidecar files
openSidecarFile now blocks symlinks, but it still opens existing files with O_TRUNC, which will truncate the underlying inode even when the path is a hard link. In a local-adversary scenario where another process can write the dump directory, it can pre-create strings_ttl.jsonl/hll_ttl.jsonl as a hard link to another writable file outside the backup tree, and this call will clobber that external file despite the symlink guard.
Useful? React with 👍 / 👎.
…est' into feat/backup-phase0a-redis-simple
Codex P2 round 9 (commit 87971ae): The round-5 unix sidecar opener used O_NOFOLLOW so the kernel refuses symlinks atomically, but it kept O_TRUNC on the same call. A local adversary who can write the output directory could pre-create strings_ttl.jsonl / hll_ttl.jsonl / KEYMAP.jsonl as a hard link to a writable file outside the dump tree (e.g. /etc/passwd) and the open's O_TRUNC would zero that inode despite the symlink guard. Restructure the unix open: 1. Open with O_WRONLY|O_CREATE|O_NOFOLLOW (no O_TRUNC). 2. fstat() the descriptor. 3. If sysStat.Nlink > 1, refuse with a wrapped error and close. 4. Otherwise call f.Truncate(0) — atomic with the file we just verified is single-linked. The Windows build (open_nofollow_windows.go) is unchanged because its symlink/hard-link attack model is materially different (SeCreateSymbolicLinkPrivilege required, ACLs gate the directory) and the platform's link-count semantics under fstat are nontrivial. Test: - TestRedisDB_OpenJSONLRefusesHardLinkClobber — pre-creates strings_ttl.jsonl as a hard link to a bait file, exercises HandleString (which lazy-opens the sidecar), asserts the "refusing to overwrite hard-linked file" error and that the bait file is untouched.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87971aef96
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, ok := r.dirsCreated[dir]; ok { | ||
| return nil | ||
| } | ||
| if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode |
There was a problem hiding this comment.
Reject symlinked parent dirs before writing backup files
The path hardening only checks the final file (os.Lstat(path) in writeFileAtomic and O_NOFOLLOW in openSidecarFile), but it never verifies parent directories, so a pre-existing symlink like redis/db_0/strings -> /tmp/outside causes HandleString to write blobs outside outRoot. This bypasses the intended “don’t follow links” safety boundary for both blob files and JSONL sidecars whenever an attacker (or accidental setup) can place directory symlinks in the output tree.
Useful? React with 👍 / 👎.
| if err == nil { | ||
| return false | ||
| } | ||
| return errors.Is(err, syscall.ENOSPC) |
There was a problem hiding this comment.
Detect Windows disk-full errors in out-of-space helper
IsBlobAtomicWriteOutOfSpace only checks errors.Is(err, syscall.ENOSPC), which works on POSIX but misses Windows disk-full write failures (typically Win32 errno 112 / ERROR_DISK_FULL). On Windows this means real out-of-space errors can be reported as generic failures, so retry/alarm logic that depends on this helper won’t classify them correctly.
Useful? React with 👍 / 👎.
…d 7) Codex round 9 raised two issues on commit 87971ae: 1. P1: O_NOFOLLOW only blocks the trailing component. A pre-existing directory symlink anywhere up the path (e.g. `<outRoot>/redis/db_0/strings -> /tmp/outside`) let MkdirAll silently honor it, then writeFileAtomic / openSidecarFile wrote outside outRoot. ensureDir now Lstat-walks each ancestor under outRoot via assertNoSymlinkAncestors and refuses if any is a symlink. This is best-effort against TOCTOU (an adversary can race us at any boundary), but closes the much more common case of a stale or misconfigured symlink in the dump tree. Full race-free traversal would need os.Root / openat which is out of scope at this layer. 2. P2: IsBlobAtomicWriteOutOfSpace only checked syscall.ENOSPC, missing Windows disk-full failures (ERROR_DISK_FULL = 112, ERROR_HANDLE_DISK_FULL = 39). Retry/alarm logic that keys off this helper would silently misclassify Windows disk exhaustion. Move the platform-specific check into disk_full_unix.go and disk_full_windows.go; the latter accepts both Win32 codes plus ENOSPC for the Go-translated case. Tests: - TestRedisDB_RefusesSymlinkedAncestor: pre-places a `redis/db_0/strings -> /tmp/...` symlink before HandleString and asserts the ancestor refusal + that the bait dir is untouched. Verified: - go test -race ./internal/backup/... clean. - GOOS=windows GOARCH=amd64 go build ./internal/backup/... clean. - golangci-lint clean.
|
@claude review Round 7 fixes for Codex round 9 on commit 87971ae (commit
|
|
Claude finished @bootjp's task in 3m 9s —— View job Review of PR #713 — Round 7 (commit
|
| Step | What it does |
|---|---|
filepath.Rel(cleanRoot, cleanTarget) |
Computes relative path; escape-to-parent detected and refused |
lstatRefuseSymlink(cleanRoot) |
Checks the root itself first |
Loop over strings.Split(rel, sep) |
Walks each component, Lstat-checking each joined path |
if seg == "" { continue } |
Handles degenerate splits from filepath.Clean'd paths |
lstatRefuseSymlink's "return nil for non-existent path" is correct because ensureDir calls MkdirAll immediately before the check — any missing directory is impossible unless MkdirAll itself failed (and that failure is caught). The TOCTOU disclosure in the comment is the right acknowledgement.
TestRedisDB_RefusesSymlinkedAncestor pre-places <root>/redis/db_0/strings → bait-tree, drives HandleString, asserts the error message, and verifies the bait is untouched. ✓
P2 — Platform-specific isDiskFullError (disk_full_unix.go, disk_full_windows.go)
- Unix:
errors.Is(err, syscall.ENOSPC)— correct. - Windows: covers
windows.ERROR_DISK_FULL(112),windows.ERROR_HANDLE_DISK_FULL(39), andsyscall.ENOSPC(for Go's translated case). The three-caseswitchis clean and exhaustive. golang.org/x/sys v0.43.0is confirmed ingo.mod— no new dependency. ✓
TestRedisDB_IsBlobAtomicWriteOutOfSpace verifies that both a raw and errors.Join-wrapped ENOSPC round-trip through IsBlobAtomicWriteOutOfSpace. ✓
Residual Item (Carried from Round 5 — Not Yet Addressed)
Medium: open_nofollow_windows.go returns a bare OS error (open_nofollow_windows.go:25)
// Windows — the os.OpenFile error is not wrapped:
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)The repo's wrapcheck linter enforces cockroachdberr.WithStack at all OS/package boundaries. This file escapes the check only because Linux CI does not cross-compile for GOOS=windows. Any future GOOS=windows golangci-lint step will flag this. One-line fix:
Additional Findings
Low / Nits
1. TestRedisDB_RefusesSymlinkedAncestor tests leaf-level only (redis_string_test.go:209)
The test places the symlink trap at <root>/redis/db_0/strings (one level below dbDir()). The assertNoSymlinkAncestors loop covers all levels including <root>/redis and <root>/redis/db_0, but these higher-level cases are not exercised by the test. The logic is the same for any level (the same lstatRefuseSymlink call), so this is a coverage note rather than a bug.
2. openJSONL bypasses ensureDir for the JSONL parent directory (redis_string.go:530)
func openJSONL(path string) (*jsonlFile, error) {
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { // direct MkdirAll — no ancestor check
...
}openJSONL calls os.MkdirAll directly rather than routing through ensureDir. In practice this is safe because HandleString/HandleHLL always call writeBlob → ensureDir before any appendTTL invocation — the ancestor check for the JSONL parent directory (dbDir()) is implicitly covered by the blob ensureDir call for the sibling subdir (dbDir()/strings, dbDir()/hll). But this ordering invariant is implicit and undocumented in openJSONL. A one-line comment noting the dependency — or threading openJSONL through ensureDir for its parent — would make the contract explicit for a future refactor.
3. disk_full_windows.go Windows error codes are untestable from Linux CI (disk_full_windows.go)
windows.ERROR_DISK_FULL and windows.ERROR_HANDLE_DISK_FULL are not exercised by any test in this PR (inherent constraint: //go:build windows code is excluded from Linux CI). This is the same limitation as open_nofollow_windows.go. Acceptable for a backup tool targeting server environments; worth noting for completeness.
Pass Summaries
-
Data loss —
assertNoSymlinkAncestorsis called afterMkdirAllsucceeds, beforedirsCreatedis populated. Pre-existing ancestor symlinks are caught; newly swapped ones require the TOCTOU window (documented).openJSONL's directMkdirAllbypasses the ancestor check but is safe due to call ordering (see finding Update dependency go to v1.21.4 #2). No new data-loss vectors. -
Concurrency — Single-threaded by design.
assertNoSymlinkAncestorsandlstatRefuseSymlinkare pure functions with no shared state. No issues. -
Performance —
assertNoSymlinkAncestorsis called once per unique directory (gated bydirsCreatedcache), O(path depth) Lstat calls per directory. Negligible for a backup tool.isDiskFullErroris O(1). No regressions. -
Data consistency — Magic-byte constants, deduplication logic, KEYMAP emission, and orphan-TTL counting are all unchanged.
golang.org/x/sysconfirmed ingo.mod. ✓ -
Test coverage — 21 tests.
TestRedisDB_RefusesSymlinkedAncestorexercises the new ancestor-check path end-to-end.TestRedisDB_IsBlobAtomicWriteOutOfSpacecovers ENOSPC and wrapped ENOSPC. Windows disk-full codes are untested (inherent platform gap, acceptable).
Overall: The two Round 7 Codex items are correctly implemented and each has a dedicated regression test. The single remaining open item is the unwrapped Windows OS error from Round 5 — a one-line fix or explicit deferral is all that stands between this and merge-ready.
| Branch |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e061b6d088
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // mounting a successful attack on the dump tree there already | ||
| // requires write access plus SeCreateSymbolicLinkPrivilege. | ||
| // Codex P1 round 6 (atomic open) + P2 round 7 (Windows build). | ||
| f, err := openSidecarFile(path) |
There was a problem hiding this comment.
Provide non-Unix/Windows fallbacks for backup OS helpers
This change introduces calls to openSidecarFile and isDiskFullError, but both helpers are only implemented in *_unix.go and *_windows.go; targets that are neither (for example GOOS=js and GOOS=wasip1) now fail to compile with undefined symbol errors. I verified this by running GOOS=js GOARCH=wasm go test -c ./internal/backup and GOOS=wasip1 GOARCH=wasm go test -c ./internal/backup, both of which fail at these call sites, whereas the parent commit compiles for those targets.
Useful? React with 👍 / 👎.
Codex P2 round 10 (commit e061b6d): Round 5 + round 7 split openSidecarFile and isDiskFullError into unix and windows build-tagged files, but the union doesn't cover every Go target. GOOS=js GOARCH=wasm and GOOS=wasip1 GOARCH=wasm now fail with `undefined: openSidecarFile` / `undefined: isDiskFullError` at compile time, which Codex reproduced via `GOOS=js go test -c ./internal/backup`. Add `_other.go` files behind `//go:build !unix && !windows`: - openSidecarFile falls back to Lstat-then-OpenFile. Those targets either lack syscall.O_NOFOLLOW (wasm runtimes) or have a different filesystem model (plan9), and the local-adversary threat that motivated the unix nlink+atomic-open guard does not apply to offline / sandboxed dump tooling. - isDiskFullError returns false. wasm has no disk; plan9 uses a different error vocabulary. The helper's contract treats unrecognised errors as non-retriable, so false is the safe default. Verified: - GOOS=js GOARCH=wasm go build ./internal/backup/... clean. - GOOS=wasip1 GOARCH=wasm go build ./internal/backup/... clean. - GOOS=windows GOARCH=amd64 go build ./internal/backup/... clean. - go test -race ./internal/backup/... clean.
|
@claude review Round 8 fix for Codex P2 round 10 on commit e061b6d (commit The unix/windows split for Added
Verified: |
|
Claude finished @bootjp's task in 1m 54s —— View job Review of PR #713 — Round 8 (commit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20236ac43b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const flag = os.O_WRONLY | os.O_CREATE | syscall.O_NOFOLLOW | ||
| f, err := os.OpenFile(path, flag, 0o600) //nolint:gosec,mnd // path is composed from output-root + fixed file name; 0600 is the standard owner-only mode |
There was a problem hiding this comment.
Reject non-regular sidecar paths before opening
openSidecarFile opens the TTL sidecar with os.O_WRONLY|os.O_CREATE before checking what kind of file it is, so a pre-existing FIFO at strings_ttl.jsonl/hll_ttl.jsonl will block the backup indefinitely when no reader is attached. The current symlink and hard-link checks do not catch this case (mkfifo produces nlink=1 and is not a symlink), so a local adversary or stale artifact in the output directory can hang the first TTL write.
Useful? React with 👍 / 👎.
Codex P2 round 11 (commit 20236ac): `openSidecarFile` opened the TTL sidecar with `O_WRONLY|O_CREATE|O_NOFOLLOW` and only checked the resulting descriptor's link count. A pre-existing FIFO at strings_ttl.jsonl / hll_ttl.jsonl would block the first write indefinitely (POSIX: opening a reader-less FIFO with O_WRONLY hangs until a reader attaches). The symlink and hard-link guards do not catch this case: `mkfifo` produces nlink=1 and is not a symlink. Add O_NONBLOCK to the unix open so a reader-less FIFO surfaces ENXIO immediately instead of hanging, then enforce info.Mode().IsRegular() after fstat. Reader-attached FIFOs, sockets, and character/block devices are all refused with a stable "non-regular file" error. The Windows path is unchanged because Windows has no FIFO concept. Test: TestRedisDB_OpenJSONLRefusesFIFO lives in redis_string_unix_test.go behind `//go:build unix` because syscall.Mkfifo is undefined on Windows and js/wasm. Verified clean: linux race tests, GOOS=windows GOARCH=amd64 build, GOOS=js GOARCH=wasm build, golangci-lint.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review Round 9 fix for Codex P2 round 11 on commit 20236ac (commit
Added Test: Verified clean: |
|
Claude finished @bootjp's task in 3m 47s —— View job Review of PR #713 — Round 9 (commit
|
| Step | What it catches |
|---|---|
| ELOOP on open | Symlinks |
| ENXIO on open | Reader-less FIFOs |
!IsRegular() after fstat |
Reader-attached FIFOs, sockets, devices |
Nlink > 1 |
Hard links to files outside dump tree |
Truncate(0) |
Only reached for verified single-link regular files |
All four threat vectors are now addressed with no gap between them. ✓
O_NONBLOCK effect on write path: O_NONBLOCK is retained on the returned *os.File fd, meaning subsequent buffered write(2) calls are technically non-blocking. For regular files on local block devices (ext4, XFS, Btrfs), the kernel never returns EAGAIN on write(2) — regular-file writes always go to the page cache. This is a no-op in practice and the IsRegular() check ensures only regular files are returned. ✓
Test: TestRedisDB_OpenJSONLRefusesFIFO (redis_string_unix_test.go:24)
The test creates a real FIFO via syscall.Mkfifo at the exact path openJSONL would open (redisStringsTTLFile), then drives HandleString with an expiring string (which triggers appendTTL → openJSONL). The error-message check accepts both "FIFO at" and "non-regular file" — correctly handling both the ENXIO path (reader-less) and the IsRegular path (reader-attached). The t.Skipf guard for platforms where mkfifo isn't available is the right defensive pattern. ✓
Residual Item (Carried from Round 5 — Still Open)
Medium: open_nofollow_windows.go and open_nofollow_other.go return bare os.OpenFile errors
Both files return the os.OpenFile result unwrapped:
// open_nofollow_windows.go:25 and open_nofollow_other.go:24
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mndThe //nolint:gosec,mnd comment suppresses gosec and mnd but not wrapcheck. The .golangci.yaml (line 27) enables wrapcheck without a cross-platform build-tag exemption. Linux CI avoids the issue only because these files are excluded by their build tags. Any GOOS=windows golangci-lint run (local dev on Windows, future CI step) would flag both lines.
The unix counterpart correctly wraps on every error path with cockroachdberr.WithStack.
Fix open_nofollow_windows.go →
Pass Summaries
-
Data loss — The four-gate open sequence (ELOOP / ENXIO / IsRegular / Nlink) is exhaustive for the known threat vectors against sidecar files.
Truncate(0)is only reached for verified single-link regular files.writeFileAtomic(blob path) is unchanged. No new data-loss vectors. -
Concurrency — Single-threaded by design.
O_NONBLOCK+IsRegular()are syscall-level primitives with no race window between them (the fstat is on the open fd, not a re-stat of the path). The TOCTOU between the IsRegular check and Truncate is not exploitable by swapping the path (the fd is already open against the inode). No issues. -
Performance — One extra fstat-on-fd and one mode flag check per sidecar open; sidecars are opened once per db encoder. Negligible.
-
Data consistency — Magic-byte constants, deduplication, KEYMAP emission, and orphan-TTL counting are unchanged. The fix is purely in the sidecar file-open layer. ✓
-
Test coverage — 22 tests (21 in
redis_string_test.go+ 1 inredis_string_unix_test.go).TestRedisDB_OpenJSONLRefusesFIFOcovers the ENXIO path end-to-end. TheIsRegularpath for a reader-attached FIFO is not separately tested (would require a goroutine to hold the read end open), but the code is a two-line guard that directly follows the ENXIO block — the gap is minimal.
Overall: Round 9 is clean. The FIFO guard is correct, well-commented, and tested. The single remaining open item — the unwrapped error in windows.go / other.go — is a one-line fix in each file, and it's the same carry-over from Round 5. Either fix it now or defer with an explicit note; it's the last gap before this PR is clean across the board.
| Branch |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4872704685
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| f, err := openJSONL(filepath.Join(r.dbDir(), baseName)) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Validate parent path before creating TTL sidecar files
Fresh evidence in this revision: writeBlob now uses ensureDir/assertNoSymlinkAncestors, but TTL sidecars still bypass that guard by calling openJSONL directly from appendTTL. Because openJSONL only applies MkdirAll(dir) plus openSidecarFile(path) (which protects only the final path element), a symlinked ancestor such as <outRoot>/redis/db_0 -> /tmp/outside will still redirect strings_ttl.jsonl/hll_ttl.jsonl writes outside the backup root.
Useful? React with 👍 / 👎.
Codex P1 round 13 (commit 4872704): writeBlob already routed through ensureDir + assertNoSymlinkAncestors (PR #713 round 7), but TTL sidecars (appendTTL → openJSONL) bypassed that guard. openJSONL ran `os.MkdirAll(filepath.Dir(path))` and then openSidecarFile — which protects only the final path element. A symlinked ancestor like `<outRoot>/redis/db_0 -> /tmp/outside` would still redirect strings_ttl.jsonl / hll_ttl.jsonl writes outside the dump root. appendTTL now calls r.ensureDir(r.dbDir()) before openJSONL so the same ancestor-walk protection applies. Caller audit (per loop's audit-semantics rule): - openJSONL has two callers: appendTTL (this file, fixed) and sqs.go's flushQueue/flushInternals. The SQS path bypasses this RedisDB.ensureDir helper because SQS lives in a different encoder; that's tracked separately and is out-of-scope for this PR's redis fix. Test: TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar pre-places `<outRoot>/redis/db_0 -> /tmp/bait` and asserts HandleString (which drives appendTTL on first TTL-bearing value) refuses with the standard "refusing to traverse symlinked ancestor" error.
|
@claude review Round 13 fix for Codex P1 round 13 on commit 4872704 (commit
Caller audit (per loop's audit-semantics rule): Test: |
|
Claude finished @bootjp's task in 5m 20s —— View job Review of PR #713 — Round 13 (commit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0383924ba2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, ok := r.dirsCreated[dir]; ok { | ||
| return nil |
There was a problem hiding this comment.
Revalidate cached dirs before every write
ensureDir skips all checks once a path is present in dirsCreated, so a directory that was safe on the first write can later be replaced with a symlink and subsequent writes will bypass assertNoSymlinkAncestors. In that scenario (e.g., after one successful write, swap redis/db_0/strings to a symlink), later writeBlob/appendTTL calls can be redirected outside outRoot, which reintroduces the path-escape issue this hardening is meant to prevent.
Useful? React with 👍 / 👎.
Codex P1 round 13 follow-up (commit 0383924): ensureDir's dirsCreated cache short-circuited BOTH the MkdirAll syscall AND the assertNoSymlinkAncestors security check. After one successful write, an adversary that can write the output directory could swap the now-cached real dir for a symlink to outside the dump root, and subsequent writeBlob/appendTTL calls would follow it without re-validation — reintroducing the path-escape vector that the round-9 ancestor check was meant to close. Restructure ensureDir so the cache only short-circuits MkdirAll; assertNoSymlinkAncestors runs on every call. Repeated symlink walks are an acceptable cost on the hot path because they touch ~5 ancestor entries per dump-root depth and are dominated by write I/O anyway. Caller audit (per loop's audit-semantics rule for behavior change): three callers in redis_string.go (writeBlob, appendTTL, recordIfFallback) all already check the error return and propagate. No API change. Test: TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess seeds the cache via a first HandleString, then swaps the cached real dir for a symlink targeting <bait>/, then drives a second HandleString and asserts the standard "refusing to traverse symlinked ancestor" refusal plus that the bait dir is untouched.
|
@claude review Round 14 fix for Codex P1 round 13 follow-up on commit 0383924 (commit
Restructured Caller audit (per loop's audit-semantics rule for behavior change): three callers in redis_string.go ( Test: |
|
Claude finished @bootjp's task in 2m 18s —— View job Review of PR #713 — Round 14 (commit
|
| Caller | dir argument | assertNoSymlinkAncestors fires |
|---|---|---|
writeBlob |
dbDir()/strings or dbDir()/hll |
On every blob write |
appendTTL |
dbDir() |
On first TTL write per encoder lifetime |
recordIfFallback |
dbDir() |
On first fallback key per encoder lifetime |
Cache semantics are now correct: dirsCreated means "MkdirAll has been called for this dir, no need to repeat it" — not "this dir has been fully validated, no need to recheck". The security check and the performance optimisation are correctly decoupled.
TOCTOU posture is unchanged: The window between assertNoSymlinkAncestors returning and the actual writeFileAtomic / openSidecarFile remains. The code correctly documents this as best-effort. What the fix closes is the much wider window that existed between the first write (where the check ran) and all subsequent writes to the same cached dir (where it did not). That wider window is what Codex P1 round 13 identified.
TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess (redis_string_test.go:210):
- Seeds
dirsCreatedfor<root>/redis/db_0/stringsvia the firstHandleString os.RemoveAll+os.Symlinkswaps the real dir for a symlink to<bait>/- Second
HandleString→writeBlob→ensureDir→assertNoSymlinkAncestorsfires and returns the expected error - Asserts the bait dir is untouched ✓
The test exercises the writeBlob path. The appendTTL path's ensureDir(dbDir()) guard is a superset (if db_0/strings would be caught, db_0 would also be caught); no separate test is needed.
Residual Item — Carried from Round 5 (10 consecutive rounds, still open)
Medium: open_nofollow_windows.go and open_nofollow_other.go return bare os.OpenFile errors (open_nofollow_windows.go:25, open_nofollow_other.go:24)
// Both files — line 25 / 24 respectively:
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mndThe //nolint:gosec,mnd suppresses gosec and mnd but not wrapcheck. The unix counterpart wraps every OS error with cockroachdberr.WithStack; these two do not. A GOOS=windows golangci-lint step would flag both. Two-line fix per file:
Fix open_nofollow_windows.go →
Pass Summaries
-
Data loss —
assertNoSymlinkAncestorsnow runs before every write path regardless of cache state. A swapped-in symlink is caught on the nextensureDircall. ThedirsCreatedcache correctly persistsMkdirAllsuppression without persisting the security bypass. No new data-loss vectors. -
Concurrency — Single-threaded by design.
assertNoSymlinkAncestorsis a pure function (no shared state, onlyos.Lstatcalls).dirsCreatedreads and writes are sequential. No issues. -
Performance —
assertNoSymlinkAncestorsnow adds ~4Lstatsyscalls per blob write on the hot path (root + redis + db_N + subdir). For 10M keys this is ~40M Lstat calls. The comment documents this tradeoff: "dominated by write I/O" — each blob write already costs a CreateTemp + Write + Rename (~3 syscalls minimum). The Lstat overhead is roughly proportional. Acceptable for an offline backup tool. -
Data consistency — Cache semantics are correct. All three callers (
writeBlob,appendTTL,recordIfFallback) benefit from both MkdirAll caching and per-call ancestor re-validation. The test verifies the re-validation fires after a cache-hit scenario. ✓ -
Test coverage — 24 tests.
TestRedisDB_EnsureDirRevalidatesAfterCachedSuccesscleanly covers the new behavior at the level that matters: thedirsCreatedcache no longer bypasses the security check. All prior tests continue to pass (the happy path is unchanged).
Overall: Round 14 is a clean, well-targeted fix for the cache-bypass vector. The restructured ensureDir is the minimum necessary change, the doc comment is accurate, and the regression test is precise. The single remaining item to close before merge — present for 10 rounds — is the two-line wrapcheck fix in open_nofollow_windows.go and open_nofollow_other.go.
| Branch |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e8f8dda14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| t.Fatal(err) | ||
| } | ||
| fifoPath := filepath.Join(dir, redisStringsTTLFile) | ||
| if err := syscall.Mkfifo(fifoPath, 0o600); err != nil { |
There was a problem hiding this comment.
Restrict FIFO test to GOOS targets that expose Mkfifo
This test is built on all unix targets, but syscall.Mkfifo is not defined on every one of them (e.g., GOOS=solaris), so go test ./internal/backup fails at compile time before the runtime t.Skip can execute. I verified this by cross-compiling tests for Solaris, which errors on this symbol, making the new test suite non-portable across supported Unix build tags.
Useful? React with 👍 / 👎.
…und 15) Codex P2 round 14 (commit 8e8f8dd): The round-9 FIFO test used `syscall.Mkfifo` behind a `//go:build unix` tag, but the standard library's syscall package does not define Mkfifo on every `unix`-tagged target — Solaris is the documented gap. `GOOS=solaris go test -c ./internal/backup` errored at compile time on the symbol, breaking cross-Unix portability claimed by the build tag. Switch to `golang.org/x/sys/unix.Mkfifo` which is defined consistently across every `unix` target. The dependency is already in go.mod (used by disk_full_windows.go for the ERROR_DISK_FULL constants). Verified: - GOOS=solaris GOARCH=amd64 go test -c ./internal/backup → clean. - go test -race on linux → clean. - golangci-lint → clean. No production code change; test-only file.
|
@claude review Round 15 fix for Codex P2 round 14 on commit 8e8f8dd (commit The round-9 FIFO test used Switched to Verified:
Test-only file; no production change. Fix merged forward into #714 / #716 / #725. |
|
Claude finished @bootjp's task in 1m 55s —— View job Review of PR #713 — Round 15 (commit
|
syscall.Mkfifo |
golang.org/x/sys/unix.Mkfifo |
|
|---|---|---|
Available on all //go:build unix targets |
❌ (missing on Solaris) | ✅ (consistent across all unix) |
Already a go.mod dependency |
N/A (stdlib) | ✅ (golang.org/x/sys v0.43.0) |
| Runtime no-op guard for unsupported platforms | t.Skipf on error ✓ |
t.Skipf on error ✓ |
The t.Skipf guard is preserved and correctly handles the case where unix.Mkfifo compiles but is unavailable at runtime on a given platform. No behavior change on linux — the test exercises the same FIFO-refusal path as before. ✓
wrapcheck Gap — Fixed in eb5f3c2
This item was noted in every review round since Round 5 (11 consecutive rounds). Both open_nofollow_windows.go and open_nofollow_other.go returned the os.OpenFile error bare, escaping wrapcheck only because Linux CI excludes them via build tags.
Before (both files, line 25/24):
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)After (both files):
f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)
if err != nil {
return nil, cockroachdberr.WithStack(err)
}
return f, nilThis matches the unix counterpart's wrapping convention exactly. cockroachdberr was already imported in both files; no new dependencies. ✓
Pass Summaries
- Data loss — Test-only change; no production code touched. All prior data-loss guards unchanged. No new vectors.
- Concurrency — Single-threaded by design. Unchanged.
- Performance — No-op for the test change. The
wrapcheckfix adds a stack capture on the error path only, which is not on the hot path. No regressions. - Data consistency — No storage-layer changes. ✓
- Test coverage —
TestRedisDB_OpenJSONLRefusesFIFOis now portable to Solaris cross-compilation. All 24 prior tests remain intact.
No remaining open items. The PR is clean across all five review passes with no unresolved gaps. The 11-round wrapcheck carry-over has been closed in eb5f3c2.
| Branch
Closes the wrapcheck gap that's been noted in reviews since round 5: open_nofollow_windows.go and open_nofollow_other.go were returning the os.OpenFile error bare, matching none of the repo's cockroachdberr.WithStack convention. Linux CI only compiles the unix variant so the linter never fires there, but a GOOS=windows golangci-lint run would flag both. Co-authored-by: Yoshiaki Ueda (bootjp) <bootjp@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/backup/redis_string.go (1)
284-299: 💤 Low valueImplementation is correct but document the negative-input behavior.
intToDecimalreturns an empty string for negative inputs (thefor v > 0loop never executes). This is safe because callers (redisDBSegment,NewRedisDB) ensure non-negative values, but a brief doc comment would clarify the contract.📝 Suggested doc clarification
// intToDecimal is a tiny zero-allocation helper for non-negative ints. // Avoids the strconv import here just to format dbIndex. +// Precondition: v >= 0; negative values return "". func intToDecimal(v int) string {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backup/redis_string.go` around lines 284 - 299, Document that intToDecimal expects a non-negative integer and that negative inputs yield an empty string; update the comment above the intToDecimal function to state the contract and mention callers (redisDBSegment, NewRedisDB) ensure non-negative values, so this function intentionally does not handle negatives.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/backup/redis_string_test.go`:
- Around line 241-276: The test currently places a symlink at db_0 which causes
the blob-path ensureDir (invoked by HandleString) to fail early, so appendTTL
isn't exercised; change the setup so the symlink exists only on the TTL sidecar
path but not the blob path. Concretely: create a real db_0/strings directory
first (e.g. call db.HandleHLL or a blob write to create the blob dir), then
remove/replace db_0 with the symlink to bait, then invoke the TTL-writing code
path (e.g. db.HandleTTL or db.HandleString with a TTL-bearing value so appendTTL
is executed) and assert the call errors with "refusing to traverse symlinked
ancestor" and that bait/strings_ttl.jsonl is not created; references:
NewRedisDB, HandleString, HandleHLL, appendTTL, ensureDir, openJSONL.
---
Nitpick comments:
In `@internal/backup/redis_string.go`:
- Around line 284-299: Document that intToDecimal expects a non-negative integer
and that negative inputs yield an empty string; update the comment above the
intToDecimal function to state the contract and mention callers (redisDBSegment,
NewRedisDB) ensure non-negative values, so this function intentionally does not
handle negatives.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b880ab0e-ea3b-4824-9536-e0669ea96368
📒 Files selected for processing (9)
internal/backup/disk_full_other.gointernal/backup/disk_full_unix.gointernal/backup/disk_full_windows.gointernal/backup/open_nofollow_other.gointernal/backup/open_nofollow_unix.gointernal/backup/open_nofollow_windows.gointernal/backup/redis_string.gointernal/backup/redis_string_test.gointernal/backup/redis_string_unix_test.go
| // TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar is the regression | ||
| // for Codex P1 round 13 (PR #713): writeBlob already routed through | ||
| // ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL -> | ||
| // openJSONL) bypassed that guard because openJSONL only protected | ||
| // the final path element via openSidecarFile. A symlinked ancestor | ||
| // like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect | ||
| // strings_ttl.jsonl writes outside the dump root. appendTTL now | ||
| // calls ensureDir on the parent directory before opening the | ||
| // sidecar. | ||
| func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) { | ||
| t.Parallel() | ||
| root := t.TempDir() | ||
| bait := filepath.Join(root, "bait-tree") | ||
| if err := os.MkdirAll(bait, 0o755); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| parent := filepath.Join(root, "redis") | ||
| if err := os.MkdirAll(parent, 0o755); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| // Symlink `<outRoot>/redis/db_0` to a directory outside the | ||
| // dump root. Without the ancestor guard, the first appendTTL | ||
| // would happily write strings_ttl.jsonl into <bait>/. | ||
| if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| db := NewRedisDB(root, 0) | ||
| // HandleString with a TTL-bearing value drives appendTTL. | ||
| err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs)) | ||
| if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") { | ||
| t.Fatalf("expected symlinked-ancestor refusal, got %v", err) | ||
| } | ||
| if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) { | ||
| t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr) | ||
| } | ||
| } |
There was a problem hiding this comment.
TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar doesn't exercise the TTL-specific fix it claims.
The symlink is placed at root/redis/db_0, which is an ancestor of both the blob directory (db_0/strings/) and the TTL sidecar (db_0/strings_ttl.jsonl). HandleString first calls ensureDir for the blob path; that call detects db_0 as a symlink and returns early — the appendTTL path is never reached. Reverting the ensureDir call added to appendTTL would leave this test green.
To specifically cover the regression, the symlinked ancestor must be on the TTL sidecar's path but not on the blob path. One approach:
- Pre-create a real
db_0/strings/dir (so a firstHandleHLLwrite succeeds). - Then replace
db_0with a symlink. - Call
HandleTTLfor the HLL key — only the TTL path hits the freshly symlinked ancestor.
🧪 Suggested alternative test setup
-func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) {
+func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL(t *testing.T) {
t.Parallel()
root := t.TempDir()
bait := filepath.Join(root, "bait-tree")
if err := os.MkdirAll(bait, 0o755); err != nil {
t.Fatal(err)
}
- parent := filepath.Join(root, "redis")
- if err := os.MkdirAll(parent, 0o755); err != nil {
- t.Fatal(err)
- }
- if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil {
- t.Fatal(err)
- }
db := NewRedisDB(root, 0)
- // HandleString with a TTL-bearing value drives appendTTL.
- err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs))
+ // First, let HandleHLL succeed so the blob path is established clean.
+ if err := db.HandleHLL([]byte("hll-k"), []byte{0x01}); err != nil {
+ t.Fatal(err)
+ }
+ // Adversary replaces db_0 with a symlink after blobs are already written.
+ dbDir := filepath.Join(root, "redis", "db_0")
+ if err := os.RemoveAll(dbDir); err != nil {
+ t.Fatal(err)
+ }
+ if err := os.Symlink(bait, dbDir); err != nil {
+ t.Fatal(err)
+ }
+ // HandleTTL drives appendTTL → ensureDir on db_0 → detects symlink.
+ err := db.HandleTTL([]byte("hll-k"), encodeTTLValue(fixedExpireMs))
if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") {
t.Fatalf("expected symlinked-ancestor refusal, got %v", err)
}
if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) {
- t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr)
+ t.Fatalf("TTL sidecar written through ancestor symlink via HandleTTL: stat err=%v", statErr)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar is the regression | |
| // for Codex P1 round 13 (PR #713): writeBlob already routed through | |
| // ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL -> | |
| // openJSONL) bypassed that guard because openJSONL only protected | |
| // the final path element via openSidecarFile. A symlinked ancestor | |
| // like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect | |
| // strings_ttl.jsonl writes outside the dump root. appendTTL now | |
| // calls ensureDir on the parent directory before opening the | |
| // sidecar. | |
| func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) { | |
| t.Parallel() | |
| root := t.TempDir() | |
| bait := filepath.Join(root, "bait-tree") | |
| if err := os.MkdirAll(bait, 0o755); err != nil { | |
| t.Fatal(err) | |
| } | |
| parent := filepath.Join(root, "redis") | |
| if err := os.MkdirAll(parent, 0o755); err != nil { | |
| t.Fatal(err) | |
| } | |
| // Symlink `<outRoot>/redis/db_0` to a directory outside the | |
| // dump root. Without the ancestor guard, the first appendTTL | |
| // would happily write strings_ttl.jsonl into <bait>/. | |
| if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil { | |
| t.Fatal(err) | |
| } | |
| db := NewRedisDB(root, 0) | |
| // HandleString with a TTL-bearing value drives appendTTL. | |
| err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs)) | |
| if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") { | |
| t.Fatalf("expected symlinked-ancestor refusal, got %v", err) | |
| } | |
| if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) { | |
| t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr) | |
| } | |
| } | |
| // TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL is the regression | |
| // for Codex P1 round 13 (PR `#713`): writeBlob already routed through | |
| // ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL -> | |
| // openJSONL) bypassed that guard because openJSONL only protected | |
| // the final path element via openSidecarFile. A symlinked ancestor | |
| // like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect | |
| // strings_ttl.jsonl writes outside the dump root. appendTTL now | |
| // calls ensureDir on the parent directory before opening the | |
| // sidecar. | |
| func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL(t *testing.T) { | |
| t.Parallel() | |
| root := t.TempDir() | |
| bait := filepath.Join(root, "bait-tree") | |
| if err := os.MkdirAll(bait, 0o755); err != nil { | |
| t.Fatal(err) | |
| } | |
| db := NewRedisDB(root, 0) | |
| // First, let HandleHLL succeed so the blob path is established clean. | |
| if err := db.HandleHLL([]byte("hll-k"), []byte{0x01}); err != nil { | |
| t.Fatal(err) | |
| } | |
| // Adversary replaces db_0 with a symlink after blobs are already written. | |
| dbDir := filepath.Join(root, "redis", "db_0") | |
| if err := os.RemoveAll(dbDir); err != nil { | |
| t.Fatal(err) | |
| } | |
| if err := os.Symlink(bait, dbDir); err != nil { | |
| t.Fatal(err) | |
| } | |
| // HandleTTL drives appendTTL → ensureDir on db_0 → detects symlink. | |
| err := db.HandleTTL([]byte("hll-k"), encodeTTLValue(fixedExpireMs)) | |
| if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") { | |
| t.Fatalf("expected symlinked-ancestor refusal, got %v", err) | |
| } | |
| if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) { | |
| t.Fatalf("TTL sidecar written through ancestor symlink via HandleTTL: stat err=%v", statErr) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/backup/redis_string_test.go` around lines 241 - 276, The test
currently places a symlink at db_0 which causes the blob-path ensureDir (invoked
by HandleString) to fail early, so appendTTL isn't exercised; change the setup
so the symlink exists only on the TTL sidecar path but not the blob path.
Concretely: create a real db_0/strings directory first (e.g. call db.HandleHLL
or a blob write to create the blob dir), then remove/replace db_0 with the
symlink to bait, then invoke the TTL-writing code path (e.g. db.HandleTTL or
db.HandleString with a TTL-bearing value so appendTTL is executed) and assert
the call errors with "refusing to traverse symlinked ancestor" and that
bait/strings_ttl.jsonl is not created; references: NewRedisDB, HandleString,
HandleHLL, appendTTL, ensureDir, openJSONL.
Summary
Stacked on top of #712. Adds the first per-adapter encoder of the Phase 0 logical-backup decoder — covers the three Redis prefixes that always map ONE snapshot record to one user key:
!redis|str|<userKey>→strings/<encoded>.bin(+ inline-TTL stripped intostrings_ttl.jsonl). Both new magic-prefix format and legacy raw bytes accepted.!redis|hll|<userKey>→hll/<encoded>.bin(raw HLL sketch).!redis|ttl|<userKey>→ routed by prior type observation:redisKindString→strings_ttl.jsonl(legacy strings)redisKindHLL→hll_ttl.jsonlredisKindUnknown→ buffered for wide-column types; orphan warning atFinalize.Wide-column types (hash / list / set / zset / stream) ship in follow-up PRs.
How TTL routing works
Pebble lex order guarantees
!redis|hll|*<!redis|str|*<!redis|ttl|*(h < s < t). So whenHandleTTLis called, the type tracker already knows what kind of key the user-key belongs to. No two-pass needed.Safety
EncodeSegment(the encoding contract from backup: filename encoding (Phase 0a foundation) #711).writeFileAtomicrefuses to overwrite a symlink so a malicious or accidental link cannot redirect output bytes.ErrRedisInvalidStringValue,ErrRedisInvalidTTLValue) rather than silent data loss.0xFF 0x01magic constants local (not imported fromadapter/redis_compat_types.go) because the decoder is intentionally adapter-independent — must run as an offline tool with no live cluster. A doc comment points back to the live source so a future renamer updates both.Test plan
go test -race ./internal/backup/...— pass.golangci-lint run ./internal/backup/...— clean.HasInlineTTLhelper.Stacking
Base:
feat/backup-phase0a-keymap-manifest(PR #712).Summary by CodeRabbit
New Features
Bug Fixes
Tests