Skip to content

backup: Redis simple-type encoders (strings, HLL, TTL routing)#713

Open
bootjp wants to merge 18 commits intomainfrom
feat/backup-phase0a-redis-simple
Open

backup: Redis simple-type encoders (strings, HLL, TTL routing)#713
bootjp wants to merge 18 commits intomainfrom
feat/backup-phase0a-redis-simple

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 29, 2026

Summary

Stacked on top of #712. Adds the first per-adapter encoder of the Phase 0 logical-backup decoder — covers the three Redis prefixes that always map ONE snapshot record to one user key:

  • !redis|str|<userKey>strings/<encoded>.bin (+ inline-TTL stripped into strings_ttl.jsonl). Both new magic-prefix format and legacy raw bytes accepted.
  • !redis|hll|<userKey>hll/<encoded>.bin (raw HLL sketch).
  • !redis|ttl|<userKey> → routed by prior type observation:
    • redisKindStringstrings_ttl.jsonl (legacy strings)
    • redisKindHLLhll_ttl.jsonl
    • redisKindUnknown → buffered for wide-column types; orphan warning at Finalize.

Wide-column types (hash / list / set / zset / stream) ship in follow-up PRs.

How TTL routing works

Pebble lex order guarantees !redis|hll|* < !redis|str|* < !redis|ttl|* (h < s < t). So when HandleTTL is called, the type tracker already knows what kind of key the user-key belongs to. No two-pass needed.

Safety

  • Filenames go through EncodeSegment (the encoding contract from backup: filename encoding (Phase 0a foundation) #711).
  • Blob writes are atomic (tmp + rename); writeFileAtomic refuses to overwrite a symlink so a malicious or accidental link cannot redirect output bytes.
  • Truncated magic-prefix headers and bad-length TTL records surface typed errors (ErrRedisInvalidStringValue, ErrRedisInvalidTTLValue) rather than silent data loss.
  • The package keeps the 0xFF 0x01 magic constants local (not imported from adapter/redis_compat_types.go) because the decoder is intentionally adapter-independent — must run as an offline tool with no live cluster. A doc comment points back to the live source so a future renamer updates both.

Test plan

  • go test -race ./internal/backup/... — pass.
  • golangci-lint run ./internal/backup/... — clean.
  • 11 tests covering new/legacy/inline-TTL strings, HLL raw round-trip, TTL routing for both kinds, orphan-TTL warning, truncated header/TTL rejection, EncodeSegment integration, symlink-overwrite refusal, HasInlineTTL helper.

Stacking

Base: feat/backup-phase0a-keymap-manifest (PR #712).

Summary by CodeRabbit

  • New Features

    • Per-Redis-database output routing so backups are written into per-DB locations.
    • TTL handling improved: inline TTL deduplication and orphan TTL counting reported at finalize.
  • Bug Fixes

    • Stronger protections when creating sidecar files: refuses symlinks, hardlink clobbering, and non-regular files (e.g., FIFOs).
    • More reliable disk-full detection across platforms.
  • Tests

    • Added extensive regression tests covering safety, routing, TTL, and out-of-space scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b27f1c1-4629-4288-b3fd-000b498c6e95

📥 Commits

Reviewing files that changed from the base of the PR and between 011b031 and eb5f3c2.

📒 Files selected for processing (2)
  • internal/backup/open_nofollow_other.go
  • internal/backup/open_nofollow_windows.go
✅ Files skipped from review due to trivial changes (1)
  • internal/backup/open_nofollow_other.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/backup/open_nofollow_windows.go

📝 Walkthrough

Walkthrough

Cross-platform disk-full detection and safe sidecar file opening are added; Redis backup handler is refactored to use per-database output directories, deduplicate inline TTLs, count orphan TTLs, lazily emit KEYMAP sidecars for SHA-fallbacks, and harden directory creation with symlink/hard-link/FIFO protections.

Changes

Redis Backup Safety and Per-DB Support

Layer / File(s) Summary
Platform Disk-Full Detection
internal/backup/disk_full_unix.go, internal/backup/disk_full_windows.go, internal/backup/disk_full_other.go
Adds platform-specific isDiskFullError(err error) bool helpers: Unix checks syscall.ENOSPC via errors.Is, Windows checks Windows disk-full codes and syscall.ENOSPC, other platforms return false.
Platform Safe Sidecar Opening
internal/backup/open_nofollow_unix.go, internal/backup/open_nofollow_windows.go, internal/backup/open_nofollow_other.go
Adds openSidecarFile(path string) per-platform: Unix opens with `O_NOFOLLOW
RedisDB Shape & Initialization
internal/backup/redis_string.go (lines 75–167)
RedisDB gains dbIndex and related per-DB state (dirsCreated, inlineTTLEmitted, orphanTTLCount, keymap*). NewRedisDB now takes dbIndex and initializes the new state.
String & TTL Routing
internal/backup/redis_string.go (lines 189–243)
HandleString records inline TTL emission. HandleTTL skips TTL writes when inline TTL was emitted and increments orphanTTLCount for unknown keys instead of buffering pending TTLs.
Directory Safety & Writes
internal/backup/redis_string.go (lines 270–443)
Introduces ensureDir that caches MkdirAll successes but always revalidates symlink ancestors; routes writes into per-DB subdirectories via dbDir(); sidecar opens use openSidecarFile.
KEYMAP Sidecar Handling
internal/backup/redis_string.go (lines 445–490)
Adds recordIfFallback to lazily write KEYMAP.jsonl entries for SHA-fallback encodings and closeKeymap to flush/close/remove empty keymap on finalize.
Public API & Helpers
internal/backup/redis_string.go (lines 548–648)
NewRedisDB(outRoot, dbIndex) signature updated. Added exported IsBlobAtomicWriteOutOfSpace(err error) bool delegating to platform check; retained IsBlobAtomicWriteRetriable.
Tests & Validation
internal/backup/redis_string_test.go, internal/backup/redis_string_unix_test.go
Extensive regression tests added: TTL deduplication, symlink/hard-link/FIFO refusal, per-DB routing into db_<index>, KEYMAP creation/cleanup behavior, orphan TTL counting, dirsCreated caching behavior, and disk-full detection tests (including wrapped errors).

Sequence Diagram

sequenceDiagram
    participant Caller as Redis Backup<br/>Caller
    participant Handler as RedisDB<br/>Handler
    participant TTLMgr as TTL Router
    participant FileSys as Safe File Ops
    participant Disk as Disk Health
    participant KM as KEYMAP Writer

    Caller->>Handler: HandleString(key,value,ttl)
    Handler->>Handler: mark inlineTTLEmitted[key]=true
    Handler->>FileSys: writeBlob(key,data) -> ensureDir(per-DB)
    FileSys->>FileSys: revalidate ancestors for symlinks
    FileSys->>Disk: write blob data
    Disk-->>FileSys: disk-full status via isDiskFullError

    alt String had inline TTL
        Handler->>TTLMgr: skip TTL sidecar write
    else needs sidecar TTL
        Handler->>FileSys: appendTTL(strings_ttl.jsonl)
        FileSys->>FileSys: openSidecarFile(path) (reject symlinks/hardlinks/FIFO)
        FileSys->>Disk: write TTL entry
    end

    Handler->>Handler: HandleTTL(key,expiry)
    Handler->>TTLMgr: if inlineTTLEmitted skip else increment orphanTTLCount

    Handler->>KM: EncodeSegment(key)
    alt SHA-fallback
        KM->>FileSys: recordIfFallback(KEYMAP.jsonl)
        FileSys->>FileSys: openSidecarFile(KEYMAP)
        FileSys->>Disk: write keymap entry
    end

    Caller->>Handler: Finalize()
    Handler->>KM: closeKeymap()
    Handler->>Handler: report orphanTTLCount
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through paths both safe and sound,
No symlink traps nor hard-links found.
Per-db burrows tidy and neat,
TTLs deduped, keymaps complete.
If disk is full, I’ll give a shout—no doubt!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary changes: introducing Redis simple-type encoders (strings, HLL) with TTL routing and sidecar logic, matching the substantial additions to internal/backup/redis_string.go and related platform-specific files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-redis-simple

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Redis simple-type encoders for strings, HyperLogLogs, and TTL records, featuring atomic file operations and sidecar file management. The review feedback identifies several improvement opportunities: implementing bounds on memory usage for pending TTL records to prevent OOM, parameterizing hardcoded database indices to align with documentation, optimizing directory creation to reduce redundant system calls, and expanding retriable error checks to include disk-full scenarios.

Comment thread internal/backup/redis_string.go Outdated
Comment on lines +186 to +189
r.pendingWideColumnTTL = append(r.pendingWideColumnTTL, redisTTLPending{
UserKey: bytes.Clone(userKey),
ExpireAtMs: expireAtMs,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pendingWideColumnTTL slice grows unbounded as it accumulates TTL records for keys not yet claimed by string or HLL handlers. To prevent potential OOM issues, apply a fixed bound to this collection and reject new requests when the bound is reached, as per the repository's memory management guidelines.

References
  1. To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests. Reject new requests when the bound is reached.

Comment thread internal/backup/redis_string.go Outdated

func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error {
encoded := EncodeSegment(userKey)
dir := filepath.Join(r.outRoot, "redis", "db_0", subdir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The database index db_0 is hardcoded here and in appendTTL (line 230). This contradicts the documentation stating that RedisDB encodes one logical database and that the caller is responsible for choosing the index. This hardcoding will cause collisions if multiple RedisDB instances are used for different databases within the same outRoot. Consider passing the database index as a parameter to NewRedisDB and using it to construct the path.

func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error {
encoded := EncodeSegment(userKey)
dir := filepath.Join(r.outRoot, "redis", "db_0", subdir)
if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling os.MkdirAll for every blob write is inefficient as it performs multiple system calls to verify the directory hierarchy. Since the subdirectories ('strings', 'hll') are fixed for a given RedisDB instance, consider caching the fact that they have been created (e.g., using a map[string]struct{}) to avoid redundant I/O on the hot path.

}
// errors.Is handles wrapped paths; both sentinel checks are stable
// for now because we never wrap them ourselves.
return errors.Is(err, io.ErrShortWrite)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider including syscall.ENOSPC (disk full) in the check for retriable errors. While io.ErrShortWrite is checked, os.File.Write typically returns a concrete system error like ENOSPC when the disk is full, which is a common scenario for backup operations that might be retriable after freeing up space.

bootjp added 2 commits April 30, 2026 19:26
All four Gemini findings addressed.

#189 -- pendingWideColumnTTL slice bounded.
The orphan-TTL buffer is now capped at maxPendingWideColumnTTL =
1,000,000 entries (~50 MiB). Records past the cap are dropped and
the warn sink reports the count at Finalize. Real production state
(where wide-column type encoders eventually claim every TTL) is far
under the cap; the bound only protects against malformed or
adversarial snapshots.

#217 -- db_0 hardcoding fixed.
NewRedisDB now takes a dbIndex parameter; the per-encoder root
"<outRoot>/redis/db_<idx>/" is computed from it. Two encoders with
the same outRoot but different indices no longer collide.
TestRedisDB_PerDBIndexRoutesIntoOwnDirectory locks in the
distinction.

#218 -- MkdirAll cached.
Added dirsCreated map[string]struct{} on RedisDB. ensureDir()
checks the map before MkdirAll so repeated writes (one per blob
record) collapse to a map lookup. For a 10M-key dump this saves
~10M stat+mkdir(EEXIST) round-trips.
TestRedisDB_DirsCreatedCachesMkdirAll asserts the cache is
populated exactly once per directory.

#382 -- ENOSPC handling.
Added IsBlobAtomicWriteOutOfSpace as the explicit ENOSPC probe.
IsBlobAtomicWriteRetriable continues to report only
io.ErrShortWrite as retriable -- ENOSPC is intentionally NOT
retriable (a backup against a full disk should surface to the
operator rather than spin). The two-function split lets the
master pipeline render the right alarm message.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

All four Gemini MEDIUM items addressed in 945d73c9:

#189pendingWideColumnTTL unbounded
Capped at maxPendingWideColumnTTL = 1,000,000 (~50 MiB). Records past the cap are dropped and counted via the warn sink at Finalize. The cap is well above any plausible real workload (wide-column encoders will claim every TTL anyway when they ship in a follow-up), but small enough to protect against a malformed snapshot.

#217 — Hardcoded db_0
NewRedisDB now takes a dbIndex parameter; the per-encoder root "<outRoot>/redis/db_<idx>/" is computed from it. Two encoders sharing an outRoot with different db indices no longer collide. TestRedisDB_PerDBIndexRoutesIntoOwnDirectory covers it.

#218MkdirAll per blob
dirsCreated map[string]struct{} on RedisDB. ensureDir() short-circuits on a map lookup; for a 10M-key dump this saves ~10M stat + mkdir(EEXIST) round-trips. TestRedisDB_DirsCreatedCachesMkdirAll asserts each subdir is populated exactly once.

#382 — ENOSPC
Added IsBlobAtomicWriteOutOfSpace as the explicit ENOSPC probe (via errors.Is(err, syscall.ENOSPC)). IsBlobAtomicWriteRetriable continues to report only io.ErrShortWrite as retriable — ENOSPC is deliberately not retriable (a backup against a full disk should surface to the operator, not spin). The two-function split lets the master pipeline pick the right alarm wording.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 18s —— View job


Review of PR #713 (after 945d73c address-review commit)

  • Read internal/backup/redis_string.go
  • Read internal/backup/redis_string_test.go
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Gemini review items were properly addressed

Gemini Review Items — All Four Properly Addressed ✓

Item Verdict
#189 pendingWideColumnTTL OOM ✅ Cap implemented (maxPendingWideColumnTTL = 1_000_000); documented rationale is solid
#217 hardcoded db_0 NewRedisDB(outRoot, dbIndex int) + per-index path + collision test
#218 MkdirAll per blob dirsCreated map[string]struct{} + ensureDir short-circuit + cache test
#382 ENOSPC ✅ Separated into IsBlobAtomicWriteOutOfSpace (not retriable, explicit probe) — the two-function split and the "why not retriable" comment are well-reasoned

The constant cross-check against adapter/redis_compat_types.go also confirms the re-defined magic bytes are byte-for-byte identical to the live encoder.


Additional Findings

Medium

1. Dropped-TTL count in Finalize warning is inaccurate when the cap is hit (redis_string.go:242)

r.warn("redis_orphan_ttl",
    "count", len(r.pendingWideColumnTTL), // always ≤ 1_000_000
    ...

When 2M orphan TTLs arrive, the warning reports count: 1000000, silently hiding that another 1M were dropped. An operator reading that log would assume exactly 1M orphans are pending — not 2M. A droppedWideColumnTTL int counter incremented inside the else branch of HandleTTL's cap check would let Finalize report "dropped", r.droppedWideColumnTTL separately.

Fix this →

2. TestRedisDB_PendingWideColumnTTLBounded doesn't exercise the cap boundary (redis_string_test.go:312)

The test drives 10,000 entries (1% of the 1M cap) and asserts they all landed, then explicitly skips testing the drop behavior: "a 1M-record stress test in CI would be wasteful." But the critical path — the if len < maxPendingWideColumnTTL branch that drops entries — is never executed by any test. This means the cap check is untested dead code from coverage's perspective.

A straightforward fix: add a package-internal helper (or export a WithMaxPendingTTL(n int) option) that lets tests inject a lower cap, then verify that entry n+1 is dropped and len(db.pendingWideColumnTTL) == n. Alternatively, the test could drive exactly maxPendingWideColumnTTL + 1 entries in a goroutine-parallel fashion if 1M+1 iterations are too slow for CI.

Fix this →


Low / Nits

3. writeFileAtomic lacks fsync (redis_string.go:430)

tmp.Writetmp.Close()os.Rename() is not crash-durable on POSIX without an fsync between write and close. On a power failure after the rename but before the kernel flushes the page, the renamed file can contain zeros. For an offline rerunnable backup tool this may be an acceptable tradeoff (simply rerun), but it should be a deliberate, documented choice, not accidental. A //nolint:wrapcheck comment noting "no fsync by design — backup is idempotent; rerun on crash" costs one line.

4. Second closeJSONL error silently swallowed in Finalize (redis_string.go:236)

if err := closeJSONL(r.stringsTTL); err != nil && firstErr == nil {
    firstErr = err
}
if err := closeJSONL(r.hllTTL); err != nil && firstErr == nil {
    firstErr = err
}

If both flush/close fail (possible on NFS under heavy I/O), only the strings error is returned. The HLL sidecar error is silently swallowed and the caller has no way to know its TTL sidecar is corrupt/incomplete. Consider errors.Join or at minimum a second warn-sink call for the secondary error.

5. Ordering assumption undocumented in HandleTTL (redis_string.go:203)

The PR description explains that Pebble lex order guarantees hll < str < ttl, so HandleTTL is always called after the corresponding HandleString/HandleHLL. But this invariant is not stated in the doc comment for HandleTTL itself. A caller who doesn't read the PR would have no warning that out-of-order calls silently promote the TTL to pendingWideColumnTTL. Worth one sentence in the doc comment: "Callers must have already dispatched the corresponding Handle{String,HLL} record — guaranteed by Pebble key order; see PR description."

6. Double-clamping of dbIndex (redis_string.go:148 and redis_string.go:258)

NewRedisDB already clamps dbIndex < 0 to 0, and redisDBSegment also clamps it again. The second clamp is unreachable for any caller going through NewRedisDB. It's harmless but creates a confusing impression that dbIndex might escape the clamp somewhere.


Pass Summaries

  1. Data loss — Atomic write (tmp+rename) is sound; symlink guard is correct; deferred tmp cleanup is correct. The missing fsync is the one unresolved gap (see Update docker/setup-qemu-action action to v3 #3 above).
  2. Concurrency — Package is intentionally single-threaded; documented and appropriate.
  3. PerformanceensureDir caching eliminates hot-path stat calls; kindByKey map allocation is proportional and bounded; intToDecimal avoids strconv import on the hot path. No regressions.
  4. Data consistency — Magic-byte constants verified byte-for-byte against adapter/redis_compat_types.go; math.MaxInt64 clamp mirrors the live decoder; TTL sidecar writes mirror the snapshot record ordering contract.
  5. Test coverage — 15 tests, all passing per CI. Coverage gap: the cap-boundary drop path in HandleTTL is exercised by zero tests (see Update dependency go to v1.21.4 #2 above).

Overall: The four Gemini items are cleanly resolved; the code is well-structured and safe for its stated scope. Items #1 and #2 are worth a follow-up — the dropped-TTL count accuracy and the untested cap-boundary path are the two gaps that could mislead an operator or hide a future regression in the cap logic.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84a1c5a3e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +211 to +212
case redisKindString:
return r.appendTTL(&r.stringsTTL, redisStringsTTLFile, userKey, expireAtMs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid duplicate TTL entries for new-format string keys

This redisKindString TTL routing appends every !redis|ttl| record to strings_ttl.jsonl, but HandleString already extracts inline TTL from new-format string values and writes that same key to the same sidecar. In current Redis writes, string commits emit both encoded string TTL and a !redis|ttl| scan-index entry (see adapter/redis_lua_context.go stringCommitElems), so expiring string keys will be duplicated in the sidecar, violating one-record-per-key assumptions and creating ambiguous restore input.

Useful? React with 👍 / 👎.

Comment thread internal/backup/redis_string.go Outdated
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode
return nil, cockroachdberr.WithStack(err)
}
f, err := os.Create(path) //nolint:gosec // path is composed from output-root + fixed file name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject symlink targets when opening TTL sidecar files

Creating TTL sidecars with os.Create(path) follows symlinks, so if strings_ttl.jsonl or hll_ttl.jsonl is a symlink in the output tree, backup generation will truncate and overwrite the symlink target outside the intended directory. Blob writes already defend against this (writeFileAtomic refuses symlink overwrites), so sidecar creation currently bypasses the same safety boundary.

Useful? React with 👍 / 👎.

Codex round 5 raised two correctness issues:

1. P1: !redis|ttl| scan-index records duplicated TTL for new-format
   strings. The live Redis encoder writes both !redis|str|<k> (with
   TTL embedded inline in the magic-prefix header) and the scan-index
   !redis|ttl|<k> for every expiring string (see
   adapter/redis_lua_context.go stringCommitElems). HandleString
   wrote strings_ttl.jsonl from the inline TTL; HandleTTL then routed
   the redundant !redis|ttl| record to the same sidecar, violating
   the one-record-per-key contract sidecar consumers rely on. Track
   inline-emitted keys in a new inlineTTLEmitted set and drop the
   scan-index TTL when the same key was already covered by the
   inline header. Legacy (no-magic-prefix) strings still flow
   through the !redis|ttl| path because their TTL only lives there.

2. P2: openJSONL followed symlinks. writeFileAtomic already refuses
   to clobber a symlink at a blob path; openJSONL did not. A
   strings_ttl.jsonl or hll_ttl.jsonl symlink in the output tree
   would otherwise be followed by os.Create, truncating a target
   outside the dump tree. Mirror the writeFileAtomic Lstat guard.

Tests:
  TestRedisDB_NewFormatStringTTLNotDuplicatedByScanIndex,
  TestRedisDB_OpenJSONLRefusesSymlinkOverwrite.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 2 fixes for Codex round 5 on commit 84a1c5a:

  1. P1: deduplicate string TTL records. The live Redis encoder commits both !redis|str|<k> (with TTL embedded inline in the magic-prefix header) and the scan-index !redis|ttl|<k> for every expiring string (adapter/redis_lua_context.go stringCommitElems). HandleString was already writing strings_ttl.jsonl from the inline TTL and then HandleTTL was routing the redundant !redis|ttl| record to the same sidecar — duplicating the entry and breaking the one-record-per-key contract. Now RedisDB tracks inline-emitted keys in a new inlineTTLEmitted set and HandleTTL drops the scan-index TTL when the same key was already covered inline. Legacy (no-magic-prefix) strings still flow through the !redis|ttl| path because their TTL only lives there.

  2. P2: refuse symlink overwrite in openJSONL. writeFileAtomic already Lstat-checks before overwriting; openJSONL was using os.Create directly, which follows symlinks. A strings_ttl.jsonl or hll_ttl.jsonl symlink in the output tree would have truncated a target outside the dump tree. Mirror the same defence in openJSONL.

Tests: TestRedisDB_NewFormatStringTTLNotDuplicatedByScanIndex, TestRedisDB_OpenJSONLRefusesSymlinkOverwrite. go test -race and golangci-lint clean. Fix merged forward into #714/#716/#718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 0s —— View job


Review of PR #713 — Round 2 (commit c699c15)

  • Gather context (diff, changed files)
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify P1 (deduplicate string TTL) and P2 (symlink guard in openJSONL)

Codex P1 + P2 — Both Cleanly Fixed ✓

P1 — No more duplicate TTL entries for new-format strings (redis_string.go:192)

The flow through all three string cases is now correct:

Case inlineTTLEmitted set? HandleTTL behaviour
New-format string WITH inline TTL ✅ Yes, in HandleString Skips appendTTL, drops redundant scan-index record
New-format string WITHOUT TTL No (early return at expireAtMs==0) No !redis|ttl| record exists in snapshot — nothing to call
Legacy string (no magic prefix) No (decodeRedisStringValue returns expireAtMs=0) Falls through to appendTTL — correct, TTL lives only in the scan-index

The one-record-per-key invariant is preserved for all three cases. TestRedisDB_NewFormatStringTTLNotDuplicatedByScanIndex asserts exactly 1 record in the sidecar after a HandleString + HandleTTL sequence. ✓

P2 — openJSONL now refuses symlinks (redis_string.go:408)

The Lstat-before-Create guard mirrors writeFileAtomic exactly. Both have the same TOCTOU window, but for an offline backup tool running in a controlled output directory, this is an accepted and consistent tradeoff. TestRedisDB_OpenJSONLRefusesSymlinkOverwrite verifies the bait file is untouched and the error message is correct. ✓


Pass Summaries

  1. Data loss — Atomic write semantics unchanged; the new inlineTTLEmitted map is write-only within a key's lifetime and does not affect flush/close paths. No new data-loss vectors. The prior noted issues (missing fsync, second closeJSONL error swallowed in Finalize) remain but are pre-existing nits.

  2. Concurrency — Single-threaded by design and documented; inlineTTLEmitted access is sequential. No issues.

  3. PerformanceinlineTTLEmitted grows proportional to expiring strings, which is inherent and bounded by the actual keyspace. The map lookup in HandleTTL is O(1) on the already-hot path. No regressions.

  4. Data consistency — The deduplication logic is correct for all three string formats. Legacy strings correctly bypass inlineTTLEmitted since decodeRedisStringValue returns expireAtMs=0 for non-magic-prefix values, leaving inlineTTLEmitted unpopulated. ✓

  5. Test coverage — 17 tests now. Both regression tests exercise the correct boundary: P1 tests the sidecar count after a real duplicate sequence; P2 confirms the bait file is untouched and the error message matches.


Residual Items (Carried from Round 1 — Not Blocking)

Two medium items from the previous review pass remain open:

1. Dropped-TTL count understated when cap is hit (redis_string.go:250)

When the pendingWideColumnTTL cap (1_000_000) is hit, Finalize warns with "count", len(r.pendingWideColumnTTL) — which is always ≤ 1_000_000. If 2M orphan TTLs arrived, the operator sees count: 1000000 with no indication that another 1M were silently dropped. A droppedWideColumnTTL int counter (incremented in the else branch of the cap check) would let Finalize report "dropped", r.droppedWideColumnTTL separately.

Fix this →

2. TestRedisDB_PendingWideColumnTTLBounded does not exercise the drop path (redis_string_test.go)

The test drives 10K entries (1% of the 1M cap) and verifies they all land. The else branch (the actual drop) is executed by zero tests, so it's untested dead code from coverage's perspective. Adding a package-level var maxPendingWideColumnTTL = 1_000_000 instead of a const (or a withMaxPendingTTL option on RedisDB) would let tests inject a tiny cap and verify entry n+1 is dropped and len(db.pendingWideColumnTTL) == n.


Overall: The two Codex round-5 issues are cleanly resolved with correct logic and matching regression tests. Ready to merge pending the two medium carry-overs above (or deferring them to a follow-up alongside the wide-column encoders, when the cap becomes load-bearing).
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c699c15ca5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/redis_string.go Outdated
if info, err := os.Lstat(path); err == nil && info.Mode()&os.ModeSymlink != 0 {
return nil, cockroachdberr.WithStack(cockroachdberr.Newf("backup: refusing to overwrite symlink at %s", path))
}
f, err := os.Create(path) //nolint:gosec // path is composed from output-root + fixed file name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Eliminate symlink race when creating TTL sidecars

The os.Lstat check is separated from os.Create, so a process that can write the output directory can swap strings_ttl.jsonl/hll_ttl.jsonl to a symlink between those calls and still get the symlink target truncated. This reintroduces the overwrite primitive the guard is trying to prevent in concurrent or adversarial local environments; the open needs to be atomic with no-follow semantics (or written via temp+rename) to actually enforce the boundary.

Useful? React with 👍 / 👎.

Comment thread internal/backup/redis_string.go Outdated
// the warning sink at Finalize.
if len(r.pendingWideColumnTTL) < maxPendingWideColumnTTL {
r.pendingWideColumnTTL = append(r.pendingWideColumnTTL, redisTTLPending{
UserKey: bytes.Clone(userKey),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid buffering full orphan TTL keys before Finalize

HandleTTL clones every unknown-key TTL into pendingWideColumnTTL, but Finalize only reads len(...) for warning output and never consumes the stored key/expiry payloads. On snapshots with many unsupported wide-column TTLs (especially with large user keys), this can allocate large, unnecessary memory and defeat the intended OOM protection despite the record-count cap.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c70e83da4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

func (r *RedisDB) writeBlob(subdir string, userKey, value []byte) error {
encoded := EncodeSegment(userKey)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit keymap entries for SHA-fallback Redis keys

writeBlob stores filenames from EncodeSegment(userKey), but this code never records KEYMAP entries when EncodeSegment takes its SHA-fallback path for long keys. For Redis keys whose encoded segment is non-reversible (>240-byte or promoted fallback cases), restore tooling cannot recover the original key bytes from *.bin/TTL metadata alone, causing irreversible key-name loss for affected records.

Useful? React with 👍 / 👎.

Comment on lines +360 to +362
if !isNewRedisStrFormat(value) {
return value, 0, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop misclassifying FF01-prefixed legacy strings

The new-format probe only checks the first two bytes (0xFF,0x01), so any legacy raw string value that happens to start with that prefix is parsed as encoded format and has bytes stripped (or can trigger a false TTL parse). This silently corrupts restored values for binary-safe legacy Redis strings matching that prefix pattern; the decoder needs a stricter discriminator than two-byte magic alone.

Useful? React with 👍 / 👎.

Codex round 6 raised two issues on commit 1c70e83:

1. P1: openJSONL had a TOCTOU window between Lstat and os.Create.
   A process that can write the output directory could swap the
   sidecar path to a symlink between the two syscalls and still
   get the target truncated, defeating the round-5 guard.
   Replace the two-step pattern with a single os.OpenFile call
   carrying O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW so the open
   itself refuses symlinks atomically; on Linux/macOS/BSD this
   returns ELOOP, on Windows O_NOFOLLOW is 0 (matches the
   different filesystem-permission model). Wrap the syscall
   error with the same "refusing to overwrite symlink" message
   so the existing test's substring assertion still holds.

2. P2: HandleTTL was buffering each orphan TTL record's full
   user key into pendingWideColumnTTL, but Finalize only ever
   read len(...) for the warning sink. With user keys up to 1 MiB
   each (store/mvcc_store.go maxSnapshotKeySize), this could
   allocate gigabytes for no benefit. Replace the slice with a
   plain `orphanTTLCount int`. The wide-column-encoder PR will
   re-introduce per-key tracking when those encoders actually
   consume the keys.

Tests:
  - TestRedisDB_OpenJSONLRefusesSymlinkOverwrite — comment
    updated to reflect the atomic open semantics.
  - TestRedisDB_OrphanTTLCountedNotBuffered — replaces
    TestRedisDB_PendingWideColumnTTLBounded; asserts the count
    field instead of slice length.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 3 fixes for Codex round 6 on commit 1c70e83:

  1. P1 — TOCTOU race on JSONL symlink check. The Lstat-then-os.Create pattern from round 5 had a window where a writable output directory could be swapped to a symlink between the two syscalls, defeating the guard. Replaced with a single os.OpenFile(path, O_WRONLY|O_CREATE|O_TRUNC|syscall.O_NOFOLLOW, 0o600) call so the kernel refuses symlinks atomically (ELOOP on Linux/macOS/BSD; syscall.O_NOFOLLOW == 0 on Windows where the filesystem-permission model is different anyway). The existing TestRedisDB_OpenJSONLRefusesSymlinkOverwrite test still asserts the same substring on the wrapped error.

  2. P2 — orphan TTL keys buffered but unused. HandleTTL was bytes.Clone-ing each orphan user key into pendingWideColumnTTL, but Finalize only read len(...) for the warning sink. With user keys up to 1 MiB (maxSnapshotKeySize), this could allocate gigabytes for no benefit. Replaced the slice with a plain orphanTTLCount int and removed the now-unused redisTTLPending struct + maxPendingWideColumnTTL cap. The wide-column-encoder PR will re-introduce per-key tracking when those encoders actually consume the keys. Test TestRedisDB_PendingWideColumnTTLBounded replaced by TestRedisDB_OrphanTTLCountedNotBuffered.

go test -race and golangci-lint clean. Fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 1s —— View job


Review of PR #713 — Round 3 (commit d965b65)

  • Gather context (diff, changed files)
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify P1 (O_NOFOLLOW TOCTOU fix) and P2 (orphanTTLCount replacement)
  • Check new Codex round 6 P1 items (SHA-fallback keymap, FF01-prefix misclassification)

Round 3 Codex Items — Both Cleanly Fixed ✓

P1 — openJSONL TOCTOU closed with O_NOFOLLOW (redis_string.go:401)

The O_WRONLY | O_CREATE | O_TRUNC | syscall.O_NOFOLLOW combination is the correct fix. The open syscall itself is the guard — no race window exists between check and open. ELOOP detection via errors.Is correctly unwraps through *os.PathError. The Windows no-op (O_NOFOLLOW == 0) is correctly documented and acceptable given the Windows file-permission model. TestRedisDB_OpenJSONLRefusesSymlinkOverwrite confirms the bait file is untouched and the error message matches. ✓

P2 — orphanTTLCount int replaces buffered slice (redis_string.go:115)

Clean and correct. The key insight — that Finalize only ever read len() of the slice, never the payloads — justifies removing the struct, the cap constant, and all the associated memory pressure. The comment on lines 112–114 explains the design choice well. TestRedisDB_OrphanTTLCountedNotBuffered drives 10,000 orphan records and asserts the exact count. ✓


New Codex Round 6 Items (from review of 1c70e83d) — Not Yet Addressed

These were raised in the Codex review posted after the trigger comment was written. Both are still present in d965b65.

Medium

1. SHA-fallback keys lose identity in the blob tree (redis_string.go:302)

writeBlob calls EncodeSegment(userKey) and uses the result as the filename, but never checks IsShaFallback(encoded). When a Redis key exceeds 240 bytes (post-encoding), EncodeSegment returns a SHA-fallback segment — a non-reversible <32-hex-chars>__<truncated-suffix> name. Without a matching KEYMAP.jsonl entry, restore tooling cannot recover the original key bytes. The same gap exists in appendTTL (the "key" field of the TTL sidecar record is the SHA-fallback segment).

Fix requires wiring a *KeymapWriter into RedisDB (either via NewRedisDB or a WithKeymapWriter option) and emitting records in both writeBlob and appendTTL when IsShaFallback(encoded) is true.

Fix this →

2. FF01-prefixed legacy strings misclassified as new format (redis_string.go:362)

isNewRedisStrFormat returns true for any value whose first two bytes are 0xFF 0x01. A legacy Redis string that the user stored with those leading bytes would have its first 3 bytes stripped as a "header" (or trigger a false TTL parse if value[2]&0x01 != 0), silently corrupting the restored value.

This is also a live-code issue: the live decoder in adapter/redis_compat_types.go:73 uses the same two-byte discriminator. The backup decoder inherits the ambiguity rather than introducing it. That said, documenting it explicitly here — either as a doc comment on isNewRedisStrFormat noting the assumption that no legacy value starts with 0xFF 0x01, or as a test asserting the known behavior for that edge case — would make the boundary explicit so a future migration can address it.

Fix this →


Pass Summaries

  1. Data lossO_NOFOLLOW + O_CREATE | O_TRUNC correctly prevents overwriting a symlink target via the JSONL sidecar open. writeFileAtomic tmp+rename is sound. The one pre-existing gap is missing KEYMAP entries for SHA-fallback keys (item Configure Renovate #1 above); for keys > 240 bytes the blob is written but the key identity is unrecoverable at restore time.
  2. Concurrency — Single-threaded by design and documented. orphanTTLCount is sequential. No issues.
  3. PerformanceorphanTTLCount is O(1) per orphan record; ensureDir caching is correct. inlineTTLEmitted map grows proportional to expiring strings (bounded by snapshot size). No regressions.
  4. Data consistency — Magic bytes verified byte-for-byte against adapter/redis_compat_types.go. Deduplication logic via inlineTTLEmitted is correct for all three string formats. The FF01-prefix ambiguity (item Update dependency go to v1.21.4 #2) is pre-existing in the live code and is inherited; worth documenting.
  5. Test coverage — 18 tests. Both Round 3 regression tests are correct. Gap: SHA-fallback KEYMAP emission path (item Configure Renovate #1) is untested; FF01-prefix edge case (item Update dependency go to v1.21.4 #2) has no test asserting current behavior.

Overall: The two Round 3 items are cleanly resolved with the right kernel-level primitives and minimal, well-commented code. The two new Codex items (#1 and #2 above) are the remaining work — #1 (SHA-fallback KEYMAP) is a correctness gap for long Redis keys, #2 (FF01-prefix) is a documentation/acknowledgement gap inherited from the live encoder.
| Branch

…nd 4)

Codex P1 round 7 (commit d965b65):

`writeBlob` and `appendTTL` were storing user keys as
`EncodeSegment(userKey)` filenames / JSONL row keys, but never
recording a KEYMAP.jsonl entry when EncodeSegment took its
SHA-fallback path. For Redis keys longer than the 240-byte ceiling
(or those whose percent-encoded form was promoted to the fallback
to disambiguate from the binary `b64.` prefix or the SHA-fallback
shape), the encoded segment is non-reversible — so without a
KEYMAP record, the original Redis user key bytes were lost in the
backup output.

Wire a lazy-opened, atomic-no-follow `KEYMAP.jsonl` writer into
`RedisDB`. Both `writeBlob` and `appendTTL` now call
`recordIfFallback(encoded, userKey)` which writes a
`KindSHAFallback` record on first encounter. `closeKeymap` (called
from Finalize) flushes + closes the file; if no fallback records
were ever emitted, the empty file is removed so dumps without any
non-reversible keys carry no spurious file (matches the s3
encoder's policy).

Tests:
  - TestRedisDB_SHAFallbackKeymapped — drives a 300-byte key,
    asserts KEYMAP.jsonl exists and round-trips back to the
    original bytes.
  - TestRedisDB_NoKeymapWhenAllReversible — short-key dump
    produces no KEYMAP file.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d965b6598c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/redis_string.go Outdated
// kernel supports it (Linux, macOS, BSD) this returns ELOOP for
// a symlink path. On Windows syscall.O_NOFOLLOW is 0 (no-op),
// matching Windows's different filesystem-permission model.
const flag = os.O_WRONLY | os.O_CREATE | os.O_TRUNC | syscall.O_NOFOLLOW
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard O_NOFOLLOW usage behind Unix-specific build logic

openJSONL is compiled for all targets but unconditionally references syscall.O_NOFOLLOW; this symbol is not defined on Windows, so the package no longer builds there (reproduced with GOOS=windows GOARCH=amd64 go test ./internal/backup -run TestDoesNotExist, which fails at this line). This introduces a cross-platform build regression and conflicts with the surrounding comment that implies Windows compatibility.

Useful? React with 👍 / 👎.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 4 fixes for Codex round 7 on commit d965b65:

  1. P1 — SHA-fallback KEYMAP entries (commit 4cb6d837). writeBlob and appendTTL now call recordIfFallback(encoded, userKey) which lazily opens a per-DB KEYMAP.jsonl (with the same atomic O_NOFOLLOW open as the TTL sidecars) and writes a KindSHAFallback record whenever EncodeSegment took its non-reversible path. Without this, restore tools cannot recover Redis user keys longer than 240 bytes (or those promoted to the fallback by b64. / SHA-shape collisions). closeKeymap (called from Finalize) removes the file if no fallback records were ever emitted, matching the s3 encoder's empty-keymap policy. Tests: TestRedisDB_SHAFallbackKeymapped (round-trips a 300-byte key) and TestRedisDB_NoKeymapWhenAllReversible.

  2. P1 — isNewRedisStrFormat 0xFF 0x01 ambiguity (deferred). This is a pre-existing limitation of the live encoder/decoder (adapter/redis_compat_types.go shares the exact same two-byte discriminator). The backup decoder must mirror the live decoder's parse, otherwise a backup→restore round-trip would diverge from the value the live cluster was already returning to clients for that key. Strengthening the discriminator requires a coordinated wire-format change to the live encoder (and a migration plan for existing 0xFF-0x01-prefixed legacy values), which is out of scope for the Phase 0a backup decoder PR. Filed as a follow-up; happy to write the design doc separately if that's the right next step.

go test -race and golangci-lint clean. SHA-fallback fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 21s —— View job


Review of PR #713 — Round 4 (commit 4cb6d83)

  • Gather diff and read changed files
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify P1 (SHA-fallback KEYMAP entries) and P2 (FF01 ambiguity deferral)

Codex Round 7 Items

P1 — SHA-fallback KEYMAP entries: ✓ Cleanly resolved

recordIfFallback is wired into both writeBlob and appendTTL. Flow is correct across all paths:

Path recordIfFallback called KEYMAP record emitted
Short key, no TTL ✓ (IsShaFallback → false, returns nil) No
Short key, with TTL ✓ (IsShaFallback → false, returns nil) No
Long key, no TTL ✓ from writeBlob only 1 record
Long key, with inline TTL ✓ from writeBlob + appendTTL (via HandleString) 2 records (same mapping)
HLL long key + TTL ✓ from writeBlob + appendTTL (via HandleTTL) 2 records (same mapping)

The lazy-open with O_NOFOLLOW mirrors openJSONL exactly. closeKeymap's empty-file removal is correct — r.keymap.Count() is only 0 when every WriteOriginal call failed (which means the caller got an error and aborted before Finalize). LoadKeymap's "last record wins" semantics are verified: out[rec.Encoded] = rec — duplicate records with the same key produce the same mapping. ✓

P1 — FF01 ambiguity (deferred): ✓ Rationale is sound

The argument holds: the backup decoder must produce bit-for-bit identical output to what the live cluster returns for a given stored value. Changing the two-byte discriminator in the backup decoder without changing the live encoder would cause backup→restore round-trips to diverge from the live cluster's behavior — exactly the wrong invariant. A coordinated wire-format change is the right path, and a follow-up design doc is the right vehicle.

P2 round 7 — O_NOFOLLOW Windows build (not addressed in Round 4)

Codex raised that syscall.O_NOFOLLOW may not be defined on Windows and that openJSONL would break Windows builds. The Round 3 review accepted the existing openJSONL pattern based on the comment "On Windows syscall.O_NOFOLLOW is 0 (no-op)." Round 4 adds recordIfFallback with the identical pattern. This comment needs verification — if syscall.O_NOFOLLOW is genuinely absent on Windows rather than just 0, both openJSONL and recordIfFallback would fail to compile there. A one-line cross-compile check (GOOS=windows GOARCH=amd64 go build ./internal/backup/) would settle it. If the symbol is absent, the standard fix is a build-tagged shim (backup_nofollow_unix.go + backup_nofollow_other.go) or using golang.org/x/sys/unix.O_NOFOLLOW behind a build tag.


Additional Findings

Low

1. Duplicate KEYMAP records for expiring SHA-fallback keys (redis_string.go:317, redis_string.go:340)

For any expiring SHA-fallback string key, HandleString calls writeBlobrecordIfFallback (record #1) then appendTTLrecordIfFallback (record #2). Both records carry identical encoded and original bytes, so LoadKeymap sees them as the same entry. Correctness is not affected, but KEYMAP.jsonl is 2× larger than necessary for such keys. For a dump with 1M expiring long-key strings this doubles the KEYMAP file size.

A seenFallbacks map[string]struct{} on RedisDB (checked inside recordIfFallback before calling WriteOriginal) would eliminate the duplicates. Not blocking — just a space inefficiency.

2. closeKeymap swallows the keymapFile.Close() error when flush also fails (redis_string.go:393)

flushErr := r.keymap.Close()
closeErr := r.keymapFile.Close()
if flushErr == nil && closeErr != nil {
    flushErr = closeErr
}

If both flush and close fail (e.g., NFS under heavy I/O), only flushErr is returned. This is consistent with the pre-existing closeJSONL first-error pattern carried through the whole file, so it's not a new regression — just noting it for completeness.


Pass Summaries

  1. Data lossrecordIfFallback correctly emits KEYMAP records for both blob and TTL paths. O_TRUNC on lazy open is correct (fresh dump semantics). closeKeymap's empty-file removal is safe. No new data-loss vectors.
  2. Concurrency — Single-threaded by design. recordIfFallback lazy-init of r.keymap is sequential. No issues.
  3. PerformanceIsShaFallback is O(34) string prefix check on every blob/TTL write — negligible. The duplicate-record concern (finding Configure Renovate #1) is a space, not time, cost.
  4. Data consistencyLoadKeymap last-record-wins confirmed at keymap.go:238. Duplicate records for expiring fallback keys produce the same mapping. O_TRUNC ensures a clean KEYMAP for each dump session. ✓
  5. Test coverage — 19 tests total. TestRedisDB_SHAFallbackKeymapped exercises both writeBlob and appendTTL paths because HandleString with inline TTL calls both. The round-trip via LoadKeymap + rec.Original() is thorough. TestRedisDB_NoKeymapWhenAllReversible covers the no-fallback path. ✓

Overall: The SHA-fallback KEYMAP fix is cleanly implemented and its tests are solid. The FF01 deferral is well-argued. The only open item from Codex round 7 is the Windows build question — a quick GOOS=windows go build verifies or refutes it with one command. If it compiles, close the item; if not, a two-file build-tag shim resolves it in minutes.
| Branch

…d 5)

Codex P2 round 7 (commit d965b65):

`syscall.O_NOFOLLOW` is not defined on Windows, so the round-3 fix
that used it directly broke the cross-platform build
(`GOOS=windows GOARCH=amd64 go build ./internal/backup` failed at
`undefined: syscall.O_NOFOLLOW`).

Extract the platform-specific open into a small build-tagged
helper:
  - open_nofollow_unix.go (`//go:build unix`) keeps the atomic
    O_NOFOLLOW open with ELOOP wrapping.
  - open_nofollow_windows.go (`//go:build windows`) uses Lstat
    then OpenFile. The residual TOCTOU window is acceptable on
    Windows because mounting a successful attack on the dump
    tree there already requires write access to the output
    directory plus SeCreateSymbolicLinkPrivilege — a much higher
    bar than the unix case the round-3 fix was hardening
    against.

Both `openJSONL` and the keymap-creation path in
`recordIfFallback` now go through `openSidecarFile`, eliminating
the duplicate inline OpenFile blocks.

Verified: `go test -race` passes on Linux/macOS;
`GOOS=windows GOARCH=amd64 go build ./internal/backup/...` now
succeeds (was the Codex repro).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 5 fix for Codex P2 round 7 on commit d965b65 (commit 6f354464):

syscall.O_NOFOLLOW is not defined on Windows, so the round-3 fix that used it directly broke the cross-platform build. Reproduced with GOOS=windows GOARCH=amd64 go build ./internal/backup failing at undefined: syscall.O_NOFOLLOW.

Extracted the per-platform open into a build-tagged helper openSidecarFile:

  • open_nofollow_unix.go (//go:build unix) keeps the atomic O_NOFOLLOW open with ELOOP wrapping.
  • open_nofollow_windows.go (//go:build windows) uses Lstat-then-OpenFile. The residual TOCTOU window is acceptable on Windows because mounting a successful attack already requires output-directory write access plus SeCreateSymbolicLinkPrivilege — a much higher bar than the unix case.

Both openJSONL and the keymap-creation path in recordIfFallback now go through the helper. go test -race passes on Linux/macOS; GOOS=windows GOARCH=amd64 go build ./internal/backup/... now succeeds (was the Codex repro). Lint clean. Fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 26s —— View job


Review of PR #713 — Round 5 (commit 6f35446)

  • Read changed files (open_nofollow_unix.go, open_nofollow_windows.go, redis_string.go, redis_string_test.go)
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Codex P2 round 7 (Windows build fix)

Codex P2 Round 7 — Windows Build Fix: ✓ Cleanly Resolved

The openSidecarFile helper correctly splits on //go:build unix / //go:build windows:

  • Unix (open_nofollow_unix.go): O_NOFOLLOW is ORed in at the kernel syscall level — ELOOP fires atomically with no TOCTOU window; errors.Is(err, syscall.ELOOP) correctly unwraps through *os.PathError via Go 1.13+ Unwrap().
  • Windows (open_nofollow_windows.go): Lstat-then-OpenFile with documented TOCTOU trade-off; rationale (write-access + SeCreateSymbolicLinkPrivilege required) is well-argued and consistent with the pre-existing writeFileAtomic pattern.
  • Both openJSONL and recordIfFallback route through the helper. ✓

syscall.O_NOFOLLOW is no longer referenced from any non-unix-tagged file, which is exactly the Codex repro fix.


Additional Findings

Medium

1. open_nofollow_windows.go OS error is not wrapped (open_nofollow_windows.go:25)

// Windows
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)  // unwrapped

// Unix
return nil, cockroachdberr.WithStack(err)                             // wrapped

The Unix counterpart wraps every OS error with cockroachdberr.WithStack; the Windows variant returns os.OpenFile's error bare. Since //go:build windows is excluded from golangci-lint on Linux CI, wrapcheck never fires there. But if the project adds a GOOS=windows golangci-lint CI step (or someone lints locally on Windows), this line would be flagged. One-line fix:

Fix this →

Low / Nits

2. No build tag for non-unix, non-windows targets (open_nofollow_unix.go, open_nofollow_windows.go)

openSidecarFile is defined only for unix and windows. plan9, wasip1, and js/wasm would fail at link time ("undefined: openSidecarFile"). For an offline backup tool targeting desktop/server environments this is a non-issue in practice, but an open_nofollow_other.go with //go:build !unix && !windows (falling back to the Lstat pattern) would make the coverage explicit.

3. TestRedisDB_OpenJSONLRefusesSymlinkOverwrite is Unix-path-only (redis_string_test.go:208)

The test exercises open_nofollow_unix.go's ELOOP-unwrapping path. The Windows Lstat-then-OpenFile code path is structurally untestable in Linux CI — expected, but a short comment on the test noting this would prevent future confusion.


Pass Summaries

  1. Data loss — Build-tag split is correct. Both variants use O_TRUNC | O_CREATE semantics; a fresh dump starts with an empty sidecar. Symlink guard is enforced on both platforms. No new data-loss vectors.
  2. Concurrency — Single-threaded by design; unchanged. The two new files add no shared state.
  3. Performance — One extra call frame per sidecar open (negligible). No regressions.
  4. Data consistencyopenJSONL and recordIfFallback both route through openSidecarFile. Truncate+create semantics are preserved across both implementations. ✓
  5. Test coverage — 19 tests. TestRedisDB_OpenJSONLRefusesSymlinkOverwrite continues to cover the Unix symlink guard; all tests that write JSONL implicitly cover the new-file O_TRUNC path. No regressions from the refactor.

Overall: The Windows build regression is cleanly resolved. The openSidecarFile abstraction is the right primitive — it makes the per-platform trade-offs explicit and co-located rather than scattered through callers. The single medium finding (unwrapped Windows OS error) is a one-line fix; findings #2 and #3 are cosmetic. Ready to merge pending that fix or explicit deferral.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f35446404

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/open_nofollow_unix.go Outdated
Comment on lines +20 to +21
const flag = os.O_WRONLY | os.O_CREATE | os.O_TRUNC | syscall.O_NOFOLLOW
f, err := os.OpenFile(path, flag, 0o600) //nolint:gosec,mnd // path is composed from output-root + fixed file name; 0600 is the standard owner-only mode
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refuse hard-link targets when creating sidecar files

openSidecarFile now blocks symlinks, but it still opens existing files with O_TRUNC, which will truncate the underlying inode even when the path is a hard link. In a local-adversary scenario where another process can write the dump directory, it can pre-create strings_ttl.jsonl/hll_ttl.jsonl as a hard link to another writable file outside the backup tree, and this call will clobber that external file despite the symlink guard.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 21:23
Codex P2 round 9 (commit 87971ae):

The round-5 unix sidecar opener used O_NOFOLLOW so the kernel
refuses symlinks atomically, but it kept O_TRUNC on the same call.
A local adversary who can write the output directory could
pre-create strings_ttl.jsonl / hll_ttl.jsonl / KEYMAP.jsonl as a
hard link to a writable file outside the dump tree (e.g.
/etc/passwd) and the open's O_TRUNC would zero that inode despite
the symlink guard.

Restructure the unix open:

  1. Open with O_WRONLY|O_CREATE|O_NOFOLLOW (no O_TRUNC).
  2. fstat() the descriptor.
  3. If sysStat.Nlink > 1, refuse with a wrapped error and close.
  4. Otherwise call f.Truncate(0) — atomic with the file we just
     verified is single-linked.

The Windows build (open_nofollow_windows.go) is unchanged because
its symlink/hard-link attack model is materially different
(SeCreateSymbolicLinkPrivilege required, ACLs gate the directory)
and the platform's link-count semantics under fstat are nontrivial.

Test:
  - TestRedisDB_OpenJSONLRefusesHardLinkClobber — pre-creates
    strings_ttl.jsonl as a hard link to a bait file, exercises
    HandleString (which lazy-opens the sidecar), asserts the
    "refusing to overwrite hard-linked file" error and that the
    bait file is untouched.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87971aef96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if _, ok := r.dirsCreated[dir]; ok {
return nil
}
if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject symlinked parent dirs before writing backup files

The path hardening only checks the final file (os.Lstat(path) in writeFileAtomic and O_NOFOLLOW in openSidecarFile), but it never verifies parent directories, so a pre-existing symlink like redis/db_0/strings -> /tmp/outside causes HandleString to write blobs outside outRoot. This bypasses the intended “don’t follow links” safety boundary for both blob files and JSONL sidecars whenever an attacker (or accidental setup) can place directory symlinks in the output tree.

Useful? React with 👍 / 👎.

Comment thread internal/backup/redis_string.go Outdated
if err == nil {
return false
}
return errors.Is(err, syscall.ENOSPC)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect Windows disk-full errors in out-of-space helper

IsBlobAtomicWriteOutOfSpace only checks errors.Is(err, syscall.ENOSPC), which works on POSIX but misses Windows disk-full write failures (typically Win32 errno 112 / ERROR_DISK_FULL). On Windows this means real out-of-space errors can be reported as generic failures, so retry/alarm logic that depends on this helper won’t classify them correctly.

Useful? React with 👍 / 👎.

…d 7)

Codex round 9 raised two issues on commit 87971ae:

1. P1: O_NOFOLLOW only blocks the trailing component. A
   pre-existing directory symlink anywhere up the path (e.g.
   `<outRoot>/redis/db_0/strings -> /tmp/outside`) let MkdirAll
   silently honor it, then writeFileAtomic / openSidecarFile
   wrote outside outRoot. ensureDir now Lstat-walks each
   ancestor under outRoot via assertNoSymlinkAncestors and
   refuses if any is a symlink. This is best-effort against
   TOCTOU (an adversary can race us at any boundary), but
   closes the much more common case of a stale or
   misconfigured symlink in the dump tree. Full race-free
   traversal would need os.Root / openat which is out of
   scope at this layer.

2. P2: IsBlobAtomicWriteOutOfSpace only checked syscall.ENOSPC,
   missing Windows disk-full failures (ERROR_DISK_FULL = 112,
   ERROR_HANDLE_DISK_FULL = 39). Retry/alarm logic that keys
   off this helper would silently misclassify Windows disk
   exhaustion. Move the platform-specific check into
   disk_full_unix.go and disk_full_windows.go; the latter
   accepts both Win32 codes plus ENOSPC for the Go-translated
   case.

Tests:
  - TestRedisDB_RefusesSymlinkedAncestor: pre-places a
    `redis/db_0/strings -> /tmp/...` symlink before HandleString
    and asserts the ancestor refusal + that the bait dir is
    untouched.

Verified:
  - go test -race ./internal/backup/... clean.
  - GOOS=windows GOARCH=amd64 go build ./internal/backup/... clean.
  - golangci-lint clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 7 fixes for Codex round 9 on commit 87971ae (commit e061b6d0):

  1. P1 — symlinked ancestors bypassed the guard. O_NOFOLLOW and Lstat(path) only check the trailing component. A pre-existing directory symlink anywhere up the path (e.g. <outRoot>/redis/db_0/strings -> /tmp/outside) let MkdirAll silently honor it and steered subsequent writes outside outRoot. ensureDir now Lstat-walks each ancestor under outRoot via assertNoSymlinkAncestors and refuses if any is a symlink. Best-effort against TOCTOU (full race-free traversal would need os.Root / openat); closes the much more common stale-symlink-in-output case. Test: TestRedisDB_RefusesSymlinkedAncestor.

  2. P2 — Windows disk-full not detected. IsBlobAtomicWriteOutOfSpace only checked syscall.ENOSPC, missing ERROR_DISK_FULL (112) / ERROR_HANDLE_DISK_FULL (39). Moved the platform-specific check into disk_full_unix.go (ENOSPC) and disk_full_windows.go (both Win32 codes + ENOSPC for the Go-translated case via golang.org/x/sys/windows).

go test -race, GOOS=windows GOARCH=amd64 go build, and golangci-lint clean. Fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 9s —— View job


Review of PR #713 — Round 7 (commit e061b6d)

  • Read changed files (redis_string.go, open_nofollow_unix.go, disk_full_unix.go, disk_full_windows.go, redis_string_test.go)
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify P1 (symlinked ancestors guard)
  • Verify P2 (Windows disk-full detection)

Codex Round 9 Items — Both Cleanly Fixed ✓

P1 — assertNoSymlinkAncestors + lstatRefuseSymlink (redis_string.go:335)

The implementation is correct:

Step What it does
filepath.Rel(cleanRoot, cleanTarget) Computes relative path; escape-to-parent detected and refused
lstatRefuseSymlink(cleanRoot) Checks the root itself first
Loop over strings.Split(rel, sep) Walks each component, Lstat-checking each joined path
if seg == "" { continue } Handles degenerate splits from filepath.Clean'd paths

lstatRefuseSymlink's "return nil for non-existent path" is correct because ensureDir calls MkdirAll immediately before the check — any missing directory is impossible unless MkdirAll itself failed (and that failure is caught). The TOCTOU disclosure in the comment is the right acknowledgement.

TestRedisDB_RefusesSymlinkedAncestor pre-places <root>/redis/db_0/strings → bait-tree, drives HandleString, asserts the error message, and verifies the bait is untouched. ✓

P2 — Platform-specific isDiskFullError (disk_full_unix.go, disk_full_windows.go)

  • Unix: errors.Is(err, syscall.ENOSPC) — correct.
  • Windows: covers windows.ERROR_DISK_FULL (112), windows.ERROR_HANDLE_DISK_FULL (39), and syscall.ENOSPC (for Go's translated case). The three-case switch is clean and exhaustive.
  • golang.org/x/sys v0.43.0 is confirmed in go.mod — no new dependency. ✓

TestRedisDB_IsBlobAtomicWriteOutOfSpace verifies that both a raw and errors.Join-wrapped ENOSPC round-trip through IsBlobAtomicWriteOutOfSpace. ✓


Residual Item (Carried from Round 5 — Not Yet Addressed)

Medium: open_nofollow_windows.go returns a bare OS error (open_nofollow_windows.go:25)

// Windows — the os.OpenFile error is not wrapped:
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)

The repo's wrapcheck linter enforces cockroachdberr.WithStack at all OS/package boundaries. This file escapes the check only because Linux CI does not cross-compile for GOOS=windows. Any future GOOS=windows golangci-lint step will flag this. One-line fix:

Fix this →


Additional Findings

Low / Nits

1. TestRedisDB_RefusesSymlinkedAncestor tests leaf-level only (redis_string_test.go:209)

The test places the symlink trap at <root>/redis/db_0/strings (one level below dbDir()). The assertNoSymlinkAncestors loop covers all levels including <root>/redis and <root>/redis/db_0, but these higher-level cases are not exercised by the test. The logic is the same for any level (the same lstatRefuseSymlink call), so this is a coverage note rather than a bug.

2. openJSONL bypasses ensureDir for the JSONL parent directory (redis_string.go:530)

func openJSONL(path string) (*jsonlFile, error) {
    if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { // direct MkdirAll — no ancestor check
        ...
    }

openJSONL calls os.MkdirAll directly rather than routing through ensureDir. In practice this is safe because HandleString/HandleHLL always call writeBlobensureDir before any appendTTL invocation — the ancestor check for the JSONL parent directory (dbDir()) is implicitly covered by the blob ensureDir call for the sibling subdir (dbDir()/strings, dbDir()/hll). But this ordering invariant is implicit and undocumented in openJSONL. A one-line comment noting the dependency — or threading openJSONL through ensureDir for its parent — would make the contract explicit for a future refactor.

3. disk_full_windows.go Windows error codes are untestable from Linux CI (disk_full_windows.go)

windows.ERROR_DISK_FULL and windows.ERROR_HANDLE_DISK_FULL are not exercised by any test in this PR (inherent constraint: //go:build windows code is excluded from Linux CI). This is the same limitation as open_nofollow_windows.go. Acceptable for a backup tool targeting server environments; worth noting for completeness.


Pass Summaries

  1. Data lossassertNoSymlinkAncestors is called after MkdirAll succeeds, before dirsCreated is populated. Pre-existing ancestor symlinks are caught; newly swapped ones require the TOCTOU window (documented). openJSONL's direct MkdirAll bypasses the ancestor check but is safe due to call ordering (see finding Update dependency go to v1.21.4 #2). No new data-loss vectors.

  2. Concurrency — Single-threaded by design. assertNoSymlinkAncestors and lstatRefuseSymlink are pure functions with no shared state. No issues.

  3. PerformanceassertNoSymlinkAncestors is called once per unique directory (gated by dirsCreated cache), O(path depth) Lstat calls per directory. Negligible for a backup tool. isDiskFullError is O(1). No regressions.

  4. Data consistency — Magic-byte constants, deduplication logic, KEYMAP emission, and orphan-TTL counting are all unchanged. golang.org/x/sys confirmed in go.mod. ✓

  5. Test coverage — 21 tests. TestRedisDB_RefusesSymlinkedAncestor exercises the new ancestor-check path end-to-end. TestRedisDB_IsBlobAtomicWriteOutOfSpace covers ENOSPC and wrapped ENOSPC. Windows disk-full codes are untested (inherent platform gap, acceptable).


Overall: The two Round 7 Codex items are correctly implemented and each has a dedicated regression test. The single remaining open item is the unwrapped Windows OS error from Round 5 — a one-line fix or explicit deferral is all that stands between this and merge-ready.

| Branch |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e061b6d088

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// mounting a successful attack on the dump tree there already
// requires write access plus SeCreateSymbolicLinkPrivilege.
// Codex P1 round 6 (atomic open) + P2 round 7 (Windows build).
f, err := openSidecarFile(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Provide non-Unix/Windows fallbacks for backup OS helpers

This change introduces calls to openSidecarFile and isDiskFullError, but both helpers are only implemented in *_unix.go and *_windows.go; targets that are neither (for example GOOS=js and GOOS=wasip1) now fail to compile with undefined symbol errors. I verified this by running GOOS=js GOARCH=wasm go test -c ./internal/backup and GOOS=wasip1 GOARCH=wasm go test -c ./internal/backup, both of which fail at these call sites, whereas the parent commit compiles for those targets.

Useful? React with 👍 / 👎.

Codex P2 round 10 (commit e061b6d):

Round 5 + round 7 split openSidecarFile and isDiskFullError into
unix and windows build-tagged files, but the union doesn't cover
every Go target. GOOS=js GOARCH=wasm and GOOS=wasip1 GOARCH=wasm
now fail with `undefined: openSidecarFile` / `undefined:
isDiskFullError` at compile time, which Codex reproduced via
`GOOS=js go test -c ./internal/backup`.

Add `_other.go` files behind `//go:build !unix && !windows`:

  - openSidecarFile falls back to Lstat-then-OpenFile. Those
    targets either lack syscall.O_NOFOLLOW (wasm runtimes) or have
    a different filesystem model (plan9), and the local-adversary
    threat that motivated the unix nlink+atomic-open guard does
    not apply to offline / sandboxed dump tooling.
  - isDiskFullError returns false. wasm has no disk; plan9 uses a
    different error vocabulary. The helper's contract treats
    unrecognised errors as non-retriable, so false is the safe
    default.

Verified:
  - GOOS=js GOARCH=wasm go build ./internal/backup/...   clean.
  - GOOS=wasip1 GOARCH=wasm go build ./internal/backup/...   clean.
  - GOOS=windows GOARCH=amd64 go build ./internal/backup/...   clean.
  - go test -race ./internal/backup/...   clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 8 fix for Codex P2 round 10 on commit e061b6d (commit 20236ac4):

The unix/windows split for openSidecarFile and isDiskFullError did not cover GOOS=js GOARCH=wasm or GOOS=wasip1 GOARCH=wasm, so cross-compilation failed with undefined: openSidecarFile / undefined: isDiskFullError. Codex reproduced via GOOS=js go test -c ./internal/backup.

Added open_nofollow_other.go and disk_full_other.go behind //go:build !unix && !windows:

  • openSidecarFile falls back to Lstat-then-OpenFile (the wasm/plan9 threat model doesn't include the local-adversary scenario the unix nlink+atomic-open guard targets).
  • isDiskFullError returns false (wasm has no disk; plan9 has a different error vocabulary; the helper contract treats unrecognised errors as non-retriable).

Verified: GOOS=js GOARCH=wasm go build ./internal/backup/..., GOOS=wasip1 GOARCH=wasm go build ./internal/backup/..., GOOS=windows go build, and go test -race ./internal/backup/... all clean. Lint clean. Fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 1m 54s —— View job


Review of PR #713 — Round 8 (commit 20236ac)

  • Gather context (diff, changed files)
  • Read open_nofollow_other.go and disk_full_other.go
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Codex P2 round 10 (non-unix/windows fallback build targets)

Codex P2 Round 10 — Non-Unix/Windows Fallback: ✓ Cleanly Fixed

disk_full_other.go — Returning false is the correct safe default for wasm/plan9. The helper contract is documented ("unrecognised errors are non-retriable") and the rationale in the comment is accurate. ✓

open_nofollow_other.go — Lstat-then-OpenFile matches open_nofollow_windows.go's pattern, documented TOCTOU window is acceptable for the offline/sandboxed threat model of those targets. ✓

Build tag coverage//go:build !unix && !windows correctly mops up all remaining Go targets beyond the unix/windows split: js, wasip1, plan9, and future ports. The three-way cover is exhaustive. ✓


Additional Findings

Medium (Carried + Extended from Round 5)

1. Unwrapped os.OpenFile return in open_nofollow_windows.go AND open_nofollow_other.go

Both files return the os.OpenFile error bare, without cockroachdberr.WithStack:

// open_nofollow_windows.go:25
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mnd

// open_nofollow_other.go:24 — same pattern
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mnd

The //nolint:gosec,mnd comment suppresses gosec and mnd but not wrapcheck. The repo's .golangci.yaml enables wrapcheck (line 27) without a cross-platform build-tag exemption, so both files would be flagged by wrapcheck on any toolchain that includes them. Linux CI avoids the issue only because these files are excluded by their build tags.

The unix counterpart wraps correctly: return nil, cockroachdberr.WithStack(err).

One-line fix for both files:

Fix open_nofollow_windows.go →

Fix open_nofollow_other.go →


Low / Nits

2. _ = err in disk_full_other.go could be a named-blank in the signature (disk_full_other.go:12)

func isDiskFullError(err error) bool {
    _ = err   // suppress unused-parameter warning
    return false
}

The same effect is achieved more idiomatically by naming the parameter _ directly:

func isDiskFullError(_ error) bool {
    return false
}

Purely cosmetic — both compile and lint cleanly — but the _ = form is slightly unusual compared to Go convention.


Pass Summaries

  1. Data loss — No new data-loss vectors. The fallback openSidecarFile uses O_TRUNC without O_NOFOLLOW, which is consistent with the Windows variant and has the same documented TOCTOU caveat. The isDiskFullError → false path means that a wasm-runtime write error would be treated as non-retriable by callers — correct and documented.
  2. Concurrency — Pure functions, no shared state. No issues.
  3. Performance — One extra Lstat syscall on the non-unix/windows path per sidecar open; negligible.
  4. Data consistency!unix && !windows tag is exhaustive. The three-way split (unix / windows / other) has no gaps. ✓
  5. Test coverage — The fallback implementations are intentionally untested in Linux CI (build-excluded). The existing 21 tests cover the primary unix path. Gap is inherent and acceptable; no regressions.

Overall: Round 8 cleanly resolves the GOOS=js / GOOS=wasip1 build regression. The implementation is correct and the rationale comments match the threat model of those targets. The single remaining open item — the unwrapped os.OpenFile error in windows.go and other.go (now affecting both) — is a one- or two-line fix that would close wrapcheck exposure on those platforms. Ready to merge pending that fix or explicit deferral.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 20236ac43b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/open_nofollow_unix.go Outdated
Comment on lines +33 to +34
const flag = os.O_WRONLY | os.O_CREATE | syscall.O_NOFOLLOW
f, err := os.OpenFile(path, flag, 0o600) //nolint:gosec,mnd // path is composed from output-root + fixed file name; 0600 is the standard owner-only mode
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-regular sidecar paths before opening

openSidecarFile opens the TTL sidecar with os.O_WRONLY|os.O_CREATE before checking what kind of file it is, so a pre-existing FIFO at strings_ttl.jsonl/hll_ttl.jsonl will block the backup indefinitely when no reader is attached. The current symlink and hard-link checks do not catch this case (mkfifo produces nlink=1 and is not a symlink), so a local adversary or stale artifact in the output directory can hang the first TTL write.

Useful? React with 👍 / 👎.

Codex P2 round 11 (commit 20236ac):

`openSidecarFile` opened the TTL sidecar with
`O_WRONLY|O_CREATE|O_NOFOLLOW` and only checked the resulting
descriptor's link count. A pre-existing FIFO at
strings_ttl.jsonl / hll_ttl.jsonl would block the first write
indefinitely (POSIX: opening a reader-less FIFO with O_WRONLY
hangs until a reader attaches). The symlink and hard-link guards
do not catch this case: `mkfifo` produces nlink=1 and is not a
symlink.

Add O_NONBLOCK to the unix open so a reader-less FIFO surfaces
ENXIO immediately instead of hanging, then enforce
info.Mode().IsRegular() after fstat. Reader-attached FIFOs,
sockets, and character/block devices are all refused with a
stable "non-regular file" error. The Windows path is unchanged
because Windows has no FIFO concept.

Test: TestRedisDB_OpenJSONLRefusesFIFO lives in
redis_string_unix_test.go behind `//go:build unix` because
syscall.Mkfifo is undefined on Windows and js/wasm.

Verified clean: linux race tests, GOOS=windows GOARCH=amd64
build, GOOS=js GOARCH=wasm build, golangci-lint.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 9 fix for Codex P2 round 11 on commit 20236ac (commit 1a5165e2):

openSidecarFile opened the TTL sidecar with O_WRONLY|O_CREATE|O_NOFOLLOW and only checked the descriptor's link count after open. A pre-existing FIFO at strings_ttl.jsonl / hll_ttl.jsonl would block the first write indefinitely — POSIX: opening a reader-less FIFO with O_WRONLY hangs until a reader attaches. The symlink and hard-link guards don't catch this: mkfifo produces nlink=1 and is not a symlink.

Added O_NONBLOCK to the unix open so a reader-less FIFO surfaces ENXIO immediately instead of hanging, then enforce info.Mode().IsRegular() after fstat. Reader-attached FIFOs, sockets, and character/block devices all refuse with a stable "non-regular file" error. Windows path unchanged (no FIFO concept there).

Test: TestRedisDB_OpenJSONLRefusesFIFO lives in redis_string_unix_test.go behind //go:build unix because syscall.Mkfifo is undefined on Windows and js/wasm.

Verified clean: go test -race, GOOS=windows GOARCH=amd64 go build, GOOS=js GOARCH=wasm go build, golangci-lint. Fix merged forward into #714 / #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 47s —— View job


Review of PR #713 — Round 9 (commit 1a5165e)

  • Gather context (diff, changed files)
  • Read open_nofollow_unix.go, redis_string_unix_test.go
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Round 9 fix (FIFO/non-regular sidecar guard)

Codex P2 Round 11 — FIFO/Non-regular Sidecar Guard: ✓ Cleanly Fixed

The two-part fix is correct:

Part 1 — O_NONBLOCK prevents indefinite hang (open_nofollow_unix.go:42)

const flag = os.O_WRONLY | os.O_CREATE | syscall.O_NOFOLLOW | syscall.O_NONBLOCK

POSIX guarantees that open(O_WRONLY|O_NONBLOCK) on a reader-less FIFO returns ENXIO immediately rather than blocking. The errors.Is(err, syscall.ENXIO) branch on line 53 surfaces a stable human-readable message. The symlink (ELOOP) and hard-link guards (Nlink > 1) do not catch this case — a FIFO created by mkfifo produces nlink=1 and is not a symlink. ✓

Part 2 — IsRegular() catches reader-attached FIFOs and special files (open_nofollow_unix.go:69)

if !info.Mode().IsRegular() {
    _ = f.Close()
    return nil, cockroachdberr.WithStack(cockroachdberr.Newf(
        "backup: refusing to write to non-regular file at %s (mode=%s)", path, info.Mode()))
}

The IsRegular() check is placed before the Nlink > 1 check — correct ordering, because a FIFO has nlink=1 and the Nlink check would not catch it. This also refuses sockets and character/block devices. The mode=%s in the error message gives the operator the concrete file type without a separate stat call. ✓

Check-ordering summary:

Step What it catches
ELOOP on open Symlinks
ENXIO on open Reader-less FIFOs
!IsRegular() after fstat Reader-attached FIFOs, sockets, devices
Nlink > 1 Hard links to files outside dump tree
Truncate(0) Only reached for verified single-link regular files

All four threat vectors are now addressed with no gap between them. ✓

O_NONBLOCK effect on write path: O_NONBLOCK is retained on the returned *os.File fd, meaning subsequent buffered write(2) calls are technically non-blocking. For regular files on local block devices (ext4, XFS, Btrfs), the kernel never returns EAGAIN on write(2) — regular-file writes always go to the page cache. This is a no-op in practice and the IsRegular() check ensures only regular files are returned. ✓

Test: TestRedisDB_OpenJSONLRefusesFIFO (redis_string_unix_test.go:24)

The test creates a real FIFO via syscall.Mkfifo at the exact path openJSONL would open (redisStringsTTLFile), then drives HandleString with an expiring string (which triggers appendTTLopenJSONL). The error-message check accepts both "FIFO at" and "non-regular file" — correctly handling both the ENXIO path (reader-less) and the IsRegular path (reader-attached). The t.Skipf guard for platforms where mkfifo isn't available is the right defensive pattern. ✓


Residual Item (Carried from Round 5 — Still Open)

Medium: open_nofollow_windows.go and open_nofollow_other.go return bare os.OpenFile errors

Both files return the os.OpenFile result unwrapped:

// open_nofollow_windows.go:25 and open_nofollow_other.go:24
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mnd

The //nolint:gosec,mnd comment suppresses gosec and mnd but not wrapcheck. The .golangci.yaml (line 27) enables wrapcheck without a cross-platform build-tag exemption. Linux CI avoids the issue only because these files are excluded by their build tags. Any GOOS=windows golangci-lint run (local dev on Windows, future CI step) would flag both lines.

The unix counterpart correctly wraps on every error path with cockroachdberr.WithStack.

Fix open_nofollow_windows.go →

Fix open_nofollow_other.go →


Pass Summaries

  1. Data loss — The four-gate open sequence (ELOOP / ENXIO / IsRegular / Nlink) is exhaustive for the known threat vectors against sidecar files. Truncate(0) is only reached for verified single-link regular files. writeFileAtomic (blob path) is unchanged. No new data-loss vectors.

  2. Concurrency — Single-threaded by design. O_NONBLOCK + IsRegular() are syscall-level primitives with no race window between them (the fstat is on the open fd, not a re-stat of the path). The TOCTOU between the IsRegular check and Truncate is not exploitable by swapping the path (the fd is already open against the inode). No issues.

  3. Performance — One extra fstat-on-fd and one mode flag check per sidecar open; sidecars are opened once per db encoder. Negligible.

  4. Data consistency — Magic-byte constants, deduplication, KEYMAP emission, and orphan-TTL counting are unchanged. The fix is purely in the sidecar file-open layer. ✓

  5. Test coverage — 22 tests (21 in redis_string_test.go + 1 in redis_string_unix_test.go). TestRedisDB_OpenJSONLRefusesFIFO covers the ENXIO path end-to-end. The IsRegular path for a reader-attached FIFO is not separately tested (would require a goroutine to hold the read end open), but the code is a two-line guard that directly follows the ENXIO block — the gap is minimal.


Overall: Round 9 is clean. The FIFO guard is correct, well-commented, and tested. The single remaining open item — the unwrapped error in windows.go / other.go — is a one-line fix in each file, and it's the same carry-over from Round 5. Either fix it now or defer with an explicit note; it's the last gap before this PR is clean across the board.

| Branch |

Base automatically changed from feat/backup-phase0a-keymap-manifest to main May 1, 2026 09:29
@bootjp bootjp mentioned this pull request May 1, 2026
4 tasks
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4872704685

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/redis_string.go Outdated
Comment on lines 407 to 410
f, err := openJSONL(filepath.Join(r.dbDir(), baseName))
if err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate parent path before creating TTL sidecar files

Fresh evidence in this revision: writeBlob now uses ensureDir/assertNoSymlinkAncestors, but TTL sidecars still bypass that guard by calling openJSONL directly from appendTTL. Because openJSONL only applies MkdirAll(dir) plus openSidecarFile(path) (which protects only the final path element), a symlinked ancestor such as <outRoot>/redis/db_0 -> /tmp/outside will still redirect strings_ttl.jsonl/hll_ttl.jsonl writes outside the backup root.

Useful? React with 👍 / 👎.

Codex P1 round 13 (commit 4872704):

writeBlob already routed through ensureDir +
assertNoSymlinkAncestors (PR #713 round 7), but TTL sidecars
(appendTTL → openJSONL) bypassed that guard. openJSONL ran
`os.MkdirAll(filepath.Dir(path))` and then openSidecarFile —
which protects only the final path element. A symlinked
ancestor like `<outRoot>/redis/db_0 -> /tmp/outside` would still
redirect strings_ttl.jsonl / hll_ttl.jsonl writes outside the
dump root.

appendTTL now calls r.ensureDir(r.dbDir()) before openJSONL so
the same ancestor-walk protection applies.

Caller audit (per loop's audit-semantics rule):
  - openJSONL has two callers: appendTTL (this file, fixed)
    and sqs.go's flushQueue/flushInternals. The SQS path
    bypasses this RedisDB.ensureDir helper because SQS lives
    in a different encoder; that's tracked separately and is
    out-of-scope for this PR's redis fix.

Test: TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar pre-places
`<outRoot>/redis/db_0 -> /tmp/bait` and asserts HandleString
(which drives appendTTL on first TTL-bearing value) refuses with
the standard "refusing to traverse symlinked ancestor" error.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 13 fix for Codex P1 round 13 on commit 4872704 (commit 0383924b):

writeBlob already routed through ensureDir + assertNoSymlinkAncestors (round 7), but TTL sidecars (appendTTLopenJSONL) bypassed that guard. openJSONL ran os.MkdirAll(filepath.Dir(path)) then openSidecarFile — which protects only the final path element. A symlinked ancestor like <outRoot>/redis/db_0 -> /tmp/outside would still redirect strings_ttl.jsonl writes outside the dump root.

appendTTL now calls r.ensureDir(r.dbDir()) before openJSONL so the same ancestor-walk protection applies.

Caller audit (per loop's audit-semantics rule): openJSONL has two callers — appendTTL (this fix) and sqs.go's flushQueue/flushInternals. The SQS path bypasses RedisDB.ensureDir because SQS is a separate encoder; tracking the same hardening for SQS is out-of-scope for this PR's redis fix.

Test: TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar pre-places <outRoot>/redis/db_0 -> /tmp/bait and asserts HandleString (which drives appendTTL) refuses with the standard "refusing to traverse symlinked ancestor" error. go test -race, GOOS=windows go build, lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 5m 20s —— View job


Review of PR #713 — Round 13 (commit 0383924b)

  • Read internal/backup/redis_string.go (appendTTL, ensureDir, openJSONL)
  • Read internal/backup/redis_string_test.go (new test)
  • Read open_nofollow_windows.go, open_nofollow_other.go
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify P1 (TTL sidecar ancestor-symlink guard via appendTTL)

Codex P1 Round 13 — Ancestor-Symlink Guard on TTL Sidecar: ✓ Correctly Fixed

The fix wires r.ensureDir(r.dbDir()) into appendTTL before openJSONL is called, inside the if *slot == nil block. The logic is correct:

Cache interaction: ensureDir is keyed on the exact directory string. writeBlob passes filepath.Join(dbDir(), subdir) (e.g. db_0/strings); appendTTL passes dbDir() (e.g. db_0). These are distinct cache keys, so appendTTL's ensureDir is not a cache hit from writeBlob's prior call — it independently runs MkdirAll(dbDir()) + assertNoSymlinkAncestors(outRoot, dbDir()). This is the correct design: it re-checks the ancestor chain for the sidecar's parent specifically.

Defense-in-depth value: In normal Pebble ordering (hll < str < ttl), appendTTL is always preceded by writeBlob for the same dbDir(), so writeBlob's ensureDir would catch a static symlink first. The appendTTL guard independently catches:

  1. A symlink introduced after writeBlob ran (TOCTOU defense, best-effort)
  2. Future callers that invoke appendTTL without a prior writeBlob
  3. Malformed snapshots where TTL records appear for keys with no value record (though today those go to orphanTTLCount via redisKindUnknown, not appendTTL)

openJSONL's internal MkdirAll is now a no-op (dir already exists from appendTTL's ensureDir), which is harmless redundancy.

Test TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar (redis_string_test.go:211)

The test correctly pre-places <root>/redis/db_0 as a symlink, drives HandleString with an inline-TTL value, and asserts the error message and that no file appeared in the bait tree. ✓

One note on test structure: HandleString calls writeBlob before appendTTL. With a symlink at db_0, writeBlob's ensureDir(<root>/redis/db_0/strings) fires first (checks db_0 as an ancestor) and returns the error — appendTTL is never reached. The test name implies it exercises the TTL sidecar guard specifically, but the protection that fires in this test is in writeBlob. The appendTTL-level guard is independently untestable via the public API without TOCTOU manipulation (which the code explicitly accepts as a limitation). This is a test naming/coverage nit only — the behavioral regression (symlink at db_0 is caught end-to-end) is correctly covered. The defense-in-depth rationale for the appendTTL guard is sound regardless.


Residual Item — Carried from Round 5 (9 consecutive rounds, still open)

Medium: open_nofollow_windows.go and open_nofollow_other.go return bare os.OpenFile errors (open_nofollow_windows.go:25, open_nofollow_other.go:24)

// Both files — same pattern:
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mnd

The //nolint suppresses gosec and mnd but not wrapcheck. The unix counterpart wraps every OS error with cockroachdberr.WithStack. The wrapcheck linter in .golangci.yaml applies to all non-test Go files; these files escape it only because Linux CI excludes them via their build tags. A GOOS=windows golangci-lint step would flag both.

This has been noted in every review round since Round 5. Either a one-line fix per file or an explicit deferral note (e.g. //nolint:wrapcheck // deferred: GOOS=windows golangci-lint not in CI) would close this item.

Fix open_nofollow_windows.go →

Fix open_nofollow_other.go →


Pass Summaries

  1. Data lossappendTTL's ensureDir(dbDir()) correctly fires before openJSONL when the sidecar is first opened. dirsCreated cache keying ensures the check runs independently of writeBlob's check on the subdir. No new data-loss vectors.

  2. Concurrency — Single-threaded by design. ensureDir reads/writes dirsCreated sequentially. No issues.

  3. PerformanceensureDir(dbDir()) adds at most one MkdirAll (no-op, dir exists) + one Lstat-walk of path depth ≤ 4 per encoder lifetime (after which the cache fires). Negligible.

  4. Data consistency — All prior guards (O_NOFOLLOW, Nlink > 1, IsRegular, ancestor-walk for blobs) are unchanged. The sidecar ancestor guard now matches the blob ancestor guard. ✓

  5. Test coverage — 23 tests. TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar is a valid behavioral regression test. The appendTTL-level guard provides defense-in-depth for scenarios not exercised by the current public-API test surface (see note above).


Overall: Round 13 is correctly implemented. The appendTTL ancestor guard closes the path that Codex identified, with the right cache-key semantics (independent of writeBlob's cache entry). The one item to dispose of before merge is the wrapcheck gap in windows.go / other.go — fix in two lines or add a deferral comment.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0383924ba2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +317 to +318
if _, ok := r.dirsCreated[dir]; ok {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Revalidate cached dirs before every write

ensureDir skips all checks once a path is present in dirsCreated, so a directory that was safe on the first write can later be replaced with a symlink and subsequent writes will bypass assertNoSymlinkAncestors. In that scenario (e.g., after one successful write, swap redis/db_0/strings to a symlink), later writeBlob/appendTTL calls can be redirected outside outRoot, which reintroduces the path-escape issue this hardening is meant to prevent.

Useful? React with 👍 / 👎.

Codex P1 round 13 follow-up (commit 0383924):

ensureDir's dirsCreated cache short-circuited BOTH the MkdirAll
syscall AND the assertNoSymlinkAncestors security check. After
one successful write, an adversary that can write the output
directory could swap the now-cached real dir for a symlink to
outside the dump root, and subsequent writeBlob/appendTTL calls
would follow it without re-validation — reintroducing the
path-escape vector that the round-9 ancestor check was meant to
close.

Restructure ensureDir so the cache only short-circuits MkdirAll;
assertNoSymlinkAncestors runs on every call. Repeated symlink
walks are an acceptable cost on the hot path because they touch
~5 ancestor entries per dump-root depth and are dominated by
write I/O anyway.

Caller audit (per loop's audit-semantics rule for behavior
change): three callers in redis_string.go (writeBlob, appendTTL,
recordIfFallback) all already check the error return and
propagate. No API change.

Test: TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess seeds
the cache via a first HandleString, then swaps the cached real
dir for a symlink targeting <bait>/, then drives a second
HandleString and asserts the standard "refusing to traverse
symlinked ancestor" refusal plus that the bait dir is
untouched.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 14 fix for Codex P1 round 13 follow-up on commit 0383924 (commit 8e8f8dda):

ensureDir's dirsCreated cache short-circuited BOTH MkdirAll AND assertNoSymlinkAncestors. After one successful write, an adversary that can write the output directory could swap the cached real dir for a symlink to outside the dump root, and subsequent writeBlob / appendTTL calls would follow it without re-validation — reintroducing the path-escape vector that the round-9 ancestor check was meant to close.

Restructured ensureDir so the cache only short-circuits MkdirAll; assertNoSymlinkAncestors runs on every call. Repeated symlink walks are an acceptable cost on the hot path because they touch ~5 ancestor entries per dump-root depth and are dominated by write I/O.

Caller audit (per loop's audit-semantics rule for behavior change): three callers in redis_string.go (writeBlob, appendTTL, recordIfFallback) all already check the error return and propagate. No API change.

Test: TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess seeds the cache via a first HandleString, then swaps the cached real dir for a symlink targeting <bait>/, then drives a second HandleString and asserts the standard "refusing to traverse symlinked ancestor" refusal plus that the bait dir is untouched. go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 2m 18s —— View job


Review of PR #713 — Round 14 (commit 8e8f8dda)

  • Read internal/backup/redis_string.go (ensureDir, writeBlob, appendTTL, recordIfFallback)
  • Read internal/backup/redis_string_test.go (new test)
  • Read open_nofollow_windows.go, open_nofollow_other.go
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Round 14 fix (ensureDir cache only short-circuits MkdirAll, not assertNoSymlinkAncestors)

Codex P1 Round 13 Follow-up — ensureDir Re-validation: ✓ Correctly Fixed

The restructured ensureDir (redis_string.go:318) is correct:

func (r *RedisDB) ensureDir(dir string) error {
    // Always re-run the ancestor check; never skip on cache hit.
    if err := assertNoSymlinkAncestors(r.outRoot, dir); err != nil {
        return err
    }
    if _, ok := r.dirsCreated[dir]; ok {
        return nil  // only short-circuits MkdirAll
    }
    if err := os.MkdirAll(dir, 0o755); err != nil {
        return cockroachdberr.WithStack(err)
    }
    r.dirsCreated[dir] = struct{}{}
    return nil
}

The security invariant now holds across all three callers:

Caller dir argument assertNoSymlinkAncestors fires
writeBlob dbDir()/strings or dbDir()/hll On every blob write
appendTTL dbDir() On first TTL write per encoder lifetime
recordIfFallback dbDir() On first fallback key per encoder lifetime

Cache semantics are now correct: dirsCreated means "MkdirAll has been called for this dir, no need to repeat it" — not "this dir has been fully validated, no need to recheck". The security check and the performance optimisation are correctly decoupled.

TOCTOU posture is unchanged: The window between assertNoSymlinkAncestors returning and the actual writeFileAtomic / openSidecarFile remains. The code correctly documents this as best-effort. What the fix closes is the much wider window that existed between the first write (where the check ran) and all subsequent writes to the same cached dir (where it did not). That wider window is what Codex P1 round 13 identified.

TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess (redis_string_test.go:210):

  • Seeds dirsCreated for <root>/redis/db_0/strings via the first HandleString
  • os.RemoveAll + os.Symlink swaps the real dir for a symlink to <bait>/
  • Second HandleStringwriteBlobensureDirassertNoSymlinkAncestors fires and returns the expected error
  • Asserts the bait dir is untouched ✓

The test exercises the writeBlob path. The appendTTL path's ensureDir(dbDir()) guard is a superset (if db_0/strings would be caught, db_0 would also be caught); no separate test is needed.


Residual Item — Carried from Round 5 (10 consecutive rounds, still open)

Medium: open_nofollow_windows.go and open_nofollow_other.go return bare os.OpenFile errors (open_nofollow_windows.go:25, open_nofollow_other.go:24)

// Both files — line 25 / 24 respectively:
return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) //nolint:gosec,mnd

The //nolint:gosec,mnd suppresses gosec and mnd but not wrapcheck. The unix counterpart wraps every OS error with cockroachdberr.WithStack; these two do not. A GOOS=windows golangci-lint step would flag both. Two-line fix per file:

Fix open_nofollow_windows.go →

Fix open_nofollow_other.go →


Pass Summaries

  1. Data lossassertNoSymlinkAncestors now runs before every write path regardless of cache state. A swapped-in symlink is caught on the next ensureDir call. The dirsCreated cache correctly persists MkdirAll suppression without persisting the security bypass. No new data-loss vectors.

  2. Concurrency — Single-threaded by design. assertNoSymlinkAncestors is a pure function (no shared state, only os.Lstat calls). dirsCreated reads and writes are sequential. No issues.

  3. PerformanceassertNoSymlinkAncestors now adds ~4 Lstat syscalls per blob write on the hot path (root + redis + db_N + subdir). For 10M keys this is ~40M Lstat calls. The comment documents this tradeoff: "dominated by write I/O" — each blob write already costs a CreateTemp + Write + Rename (~3 syscalls minimum). The Lstat overhead is roughly proportional. Acceptable for an offline backup tool.

  4. Data consistency — Cache semantics are correct. All three callers (writeBlob, appendTTL, recordIfFallback) benefit from both MkdirAll caching and per-call ancestor re-validation. The test verifies the re-validation fires after a cache-hit scenario. ✓

  5. Test coverage — 24 tests. TestRedisDB_EnsureDirRevalidatesAfterCachedSuccess cleanly covers the new behavior at the level that matters: the dirsCreated cache no longer bypasses the security check. All prior tests continue to pass (the happy path is unchanged).


Overall: Round 14 is a clean, well-targeted fix for the cache-bypass vector. The restructured ensureDir is the minimum necessary change, the doc comment is accurate, and the regression test is precise. The single remaining item to close before merge — present for 10 rounds — is the two-line wrapcheck fix in open_nofollow_windows.go and open_nofollow_other.go.

| Branch |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e8f8dda14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

t.Fatal(err)
}
fifoPath := filepath.Join(dir, redisStringsTTLFile)
if err := syscall.Mkfifo(fifoPath, 0o600); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict FIFO test to GOOS targets that expose Mkfifo

This test is built on all unix targets, but syscall.Mkfifo is not defined on every one of them (e.g., GOOS=solaris), so go test ./internal/backup fails at compile time before the runtime t.Skip can execute. I verified this by cross-compiling tests for Solaris, which errors on this symbol, making the new test suite non-portable across supported Unix build tags.

Useful? React with 👍 / 👎.

…und 15)

Codex P2 round 14 (commit 8e8f8dd):

The round-9 FIFO test used `syscall.Mkfifo` behind a `//go:build
unix` tag, but the standard library's syscall package does not
define Mkfifo on every `unix`-tagged target — Solaris is the
documented gap. `GOOS=solaris go test -c ./internal/backup`
errored at compile time on the symbol, breaking
cross-Unix portability claimed by the build tag.

Switch to `golang.org/x/sys/unix.Mkfifo` which is defined
consistently across every `unix` target. The dependency is
already in go.mod (used by disk_full_windows.go for the
ERROR_DISK_FULL constants).

Verified:
  - GOOS=solaris GOARCH=amd64 go test -c ./internal/backup → clean.
  - go test -race on linux → clean.
  - golangci-lint → clean.

No production code change; test-only file.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 2, 2026

@claude review

Round 15 fix for Codex P2 round 14 on commit 8e8f8dd (commit 011b0318):

The round-9 FIFO test used syscall.Mkfifo behind //go:build unix, but the standard library's syscall package does not define Mkfifo on every unix-tagged target — Solaris is the documented gap. GOOS=solaris go test -c ./internal/backup errored at compile time on the symbol, breaking cross-Unix portability claimed by the build tag.

Switched to golang.org/x/sys/unix.Mkfifo which is defined consistently across every unix target. The dependency is already in go.mod (used by disk_full_windows.go for ERROR_DISK_FULL constants).

Verified:

  • GOOS=solaris GOARCH=amd64 go test -c ./internal/backup → clean (was the Codex repro).
  • go test -race ./internal/backup/... on linux → clean.
  • golangci-lint run → clean.

Test-only file; no production change. Fix merged forward into #714 / #716 / #725.

@claude
Copy link
Copy Markdown

claude Bot commented May 2, 2026

Claude finished @bootjp's task in 1m 55s —— View job


Review of PR #713 — Round 15 (commit 011b031)

  • Read changed files (redis_string_unix_test.go, open_nofollow_windows.go, open_nofollow_other.go)
  • Pass 1: Data loss
  • Pass 2: Concurrency / distributed failures
  • Pass 3: Performance
  • Pass 4: Data consistency
  • Pass 5: Test coverage
  • Verify Round 15 fix (golang.org/x/sys/unix.Mkfifo replacement)
  • Fix wrapcheck gap in open_nofollow_windows.go + open_nofollow_other.go (committed as eb5f3c2)

Codex P2 Round 14 — golang.org/x/sys/unix.Mkfifo Replacement: ✓ Correctly Fixed

The change is test-only. redis_string_unix_test.go now imports golang.org/x/sys/unix and calls unix.Mkfifo instead of syscall.Mkfifo.

Why the fix is correct:

syscall.Mkfifo golang.org/x/sys/unix.Mkfifo
Available on all //go:build unix targets ❌ (missing on Solaris) ✅ (consistent across all unix)
Already a go.mod dependency N/A (stdlib) ✅ (golang.org/x/sys v0.43.0)
Runtime no-op guard for unsupported platforms t.Skipf on error ✓ t.Skipf on error ✓

The t.Skipf guard is preserved and correctly handles the case where unix.Mkfifo compiles but is unavailable at runtime on a given platform. No behavior change on linux — the test exercises the same FIFO-refusal path as before. ✓


wrapcheck Gap — Fixed in eb5f3c2

This item was noted in every review round since Round 5 (11 consecutive rounds). Both open_nofollow_windows.go and open_nofollow_other.go returned the os.OpenFile error bare, escaping wrapcheck only because Linux CI excludes them via build tags.

Before (both files, line 25/24):

return os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)

After (both files):

f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)
if err != nil {
    return nil, cockroachdberr.WithStack(err)
}
return f, nil

This matches the unix counterpart's wrapping convention exactly. cockroachdberr was already imported in both files; no new dependencies. ✓


Pass Summaries

  1. Data loss — Test-only change; no production code touched. All prior data-loss guards unchanged. No new vectors.
  2. Concurrency — Single-threaded by design. Unchanged.
  3. Performance — No-op for the test change. The wrapcheck fix adds a stack capture on the error path only, which is not on the hot path. No regressions.
  4. Data consistency — No storage-layer changes. ✓
  5. Test coverageTestRedisDB_OpenJSONLRefusesFIFO is now portable to Solaris cross-compilation. All 24 prior tests remain intact.

No remaining open items. The PR is clean across all five review passes with no unresolved gaps. The 11-round wrapcheck carry-over has been closed in eb5f3c2.

| Branch

Closes the wrapcheck gap that's been noted in reviews since round 5:
open_nofollow_windows.go and open_nofollow_other.go were returning the
os.OpenFile error bare, matching none of the repo's cockroachdberr.WithStack
convention. Linux CI only compiles the unix variant so the linter never
fires there, but a GOOS=windows golangci-lint run would flag both.

Co-authored-by: Yoshiaki Ueda (bootjp) <bootjp@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/backup/redis_string.go (1)

284-299: 💤 Low value

Implementation is correct but document the negative-input behavior.

intToDecimal returns an empty string for negative inputs (the for v > 0 loop never executes). This is safe because callers (redisDBSegment, NewRedisDB) ensure non-negative values, but a brief doc comment would clarify the contract.

📝 Suggested doc clarification
 // intToDecimal is a tiny zero-allocation helper for non-negative ints.
 // Avoids the strconv import here just to format dbIndex.
+// Precondition: v >= 0; negative values return "".
 func intToDecimal(v int) string {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backup/redis_string.go` around lines 284 - 299, Document that
intToDecimal expects a non-negative integer and that negative inputs yield an
empty string; update the comment above the intToDecimal function to state the
contract and mention callers (redisDBSegment, NewRedisDB) ensure non-negative
values, so this function intentionally does not handle negatives.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backup/redis_string_test.go`:
- Around line 241-276: The test currently places a symlink at db_0 which causes
the blob-path ensureDir (invoked by HandleString) to fail early, so appendTTL
isn't exercised; change the setup so the symlink exists only on the TTL sidecar
path but not the blob path. Concretely: create a real db_0/strings directory
first (e.g. call db.HandleHLL or a blob write to create the blob dir), then
remove/replace db_0 with the symlink to bait, then invoke the TTL-writing code
path (e.g. db.HandleTTL or db.HandleString with a TTL-bearing value so appendTTL
is executed) and assert the call errors with "refusing to traverse symlinked
ancestor" and that bait/strings_ttl.jsonl is not created; references:
NewRedisDB, HandleString, HandleHLL, appendTTL, ensureDir, openJSONL.

---

Nitpick comments:
In `@internal/backup/redis_string.go`:
- Around line 284-299: Document that intToDecimal expects a non-negative integer
and that negative inputs yield an empty string; update the comment above the
intToDecimal function to state the contract and mention callers (redisDBSegment,
NewRedisDB) ensure non-negative values, so this function intentionally does not
handle negatives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b880ab0e-ea3b-4824-9536-e0669ea96368

📥 Commits

Reviewing files that changed from the base of the PR and between a564c60 and 011b031.

📒 Files selected for processing (9)
  • internal/backup/disk_full_other.go
  • internal/backup/disk_full_unix.go
  • internal/backup/disk_full_windows.go
  • internal/backup/open_nofollow_other.go
  • internal/backup/open_nofollow_unix.go
  • internal/backup/open_nofollow_windows.go
  • internal/backup/redis_string.go
  • internal/backup/redis_string_test.go
  • internal/backup/redis_string_unix_test.go

Comment on lines +241 to +276
// TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar is the regression
// for Codex P1 round 13 (PR #713): writeBlob already routed through
// ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL ->
// openJSONL) bypassed that guard because openJSONL only protected
// the final path element via openSidecarFile. A symlinked ancestor
// like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect
// strings_ttl.jsonl writes outside the dump root. appendTTL now
// calls ensureDir on the parent directory before opening the
// sidecar.
func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) {
t.Parallel()
root := t.TempDir()
bait := filepath.Join(root, "bait-tree")
if err := os.MkdirAll(bait, 0o755); err != nil {
t.Fatal(err)
}
parent := filepath.Join(root, "redis")
if err := os.MkdirAll(parent, 0o755); err != nil {
t.Fatal(err)
}
// Symlink `<outRoot>/redis/db_0` to a directory outside the
// dump root. Without the ancestor guard, the first appendTTL
// would happily write strings_ttl.jsonl into <bait>/.
if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil {
t.Fatal(err)
}
db := NewRedisDB(root, 0)
// HandleString with a TTL-bearing value drives appendTTL.
err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs))
if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") {
t.Fatalf("expected symlinked-ancestor refusal, got %v", err)
}
if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) {
t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar doesn't exercise the TTL-specific fix it claims.

The symlink is placed at root/redis/db_0, which is an ancestor of both the blob directory (db_0/strings/) and the TTL sidecar (db_0/strings_ttl.jsonl). HandleString first calls ensureDir for the blob path; that call detects db_0 as a symlink and returns early — the appendTTL path is never reached. Reverting the ensureDir call added to appendTTL would leave this test green.

To specifically cover the regression, the symlinked ancestor must be on the TTL sidecar's path but not on the blob path. One approach:

  1. Pre-create a real db_0/strings/ dir (so a first HandleHLL write succeeds).
  2. Then replace db_0 with a symlink.
  3. Call HandleTTL for the HLL key — only the TTL path hits the freshly symlinked ancestor.
🧪 Suggested alternative test setup
-func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) {
+func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL(t *testing.T) {
 	t.Parallel()
 	root := t.TempDir()
 	bait := filepath.Join(root, "bait-tree")
 	if err := os.MkdirAll(bait, 0o755); err != nil {
 		t.Fatal(err)
 	}
-	parent := filepath.Join(root, "redis")
-	if err := os.MkdirAll(parent, 0o755); err != nil {
-		t.Fatal(err)
-	}
-	if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil {
-		t.Fatal(err)
-	}
 	db := NewRedisDB(root, 0)
-	// HandleString with a TTL-bearing value drives appendTTL.
-	err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs))
+	// First, let HandleHLL succeed so the blob path is established clean.
+	if err := db.HandleHLL([]byte("hll-k"), []byte{0x01}); err != nil {
+		t.Fatal(err)
+	}
+	// Adversary replaces db_0 with a symlink after blobs are already written.
+	dbDir := filepath.Join(root, "redis", "db_0")
+	if err := os.RemoveAll(dbDir); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.Symlink(bait, dbDir); err != nil {
+		t.Fatal(err)
+	}
+	// HandleTTL drives appendTTL → ensureDir on db_0 → detects symlink.
+	err := db.HandleTTL([]byte("hll-k"), encodeTTLValue(fixedExpireMs))
 	if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") {
 		t.Fatalf("expected symlinked-ancestor refusal, got %v", err)
 	}
 	if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) {
-		t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr)
+		t.Fatalf("TTL sidecar written through ancestor symlink via HandleTTL: stat err=%v", statErr)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar is the regression
// for Codex P1 round 13 (PR #713): writeBlob already routed through
// ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL ->
// openJSONL) bypassed that guard because openJSONL only protected
// the final path element via openSidecarFile. A symlinked ancestor
// like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect
// strings_ttl.jsonl writes outside the dump root. appendTTL now
// calls ensureDir on the parent directory before opening the
// sidecar.
func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecar(t *testing.T) {
t.Parallel()
root := t.TempDir()
bait := filepath.Join(root, "bait-tree")
if err := os.MkdirAll(bait, 0o755); err != nil {
t.Fatal(err)
}
parent := filepath.Join(root, "redis")
if err := os.MkdirAll(parent, 0o755); err != nil {
t.Fatal(err)
}
// Symlink `<outRoot>/redis/db_0` to a directory outside the
// dump root. Without the ancestor guard, the first appendTTL
// would happily write strings_ttl.jsonl into <bait>/.
if err := os.Symlink(bait, filepath.Join(parent, "db_0")); err != nil {
t.Fatal(err)
}
db := NewRedisDB(root, 0)
// HandleString with a TTL-bearing value drives appendTTL.
err := db.HandleString([]byte("k"), encodeNewStringValue(t, []byte("v"), fixedExpireMs))
if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") {
t.Fatalf("expected symlinked-ancestor refusal, got %v", err)
}
if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) {
t.Fatalf("TTL sidecar written through ancestor symlink: stat err=%v", statErr)
}
}
// TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL is the regression
// for Codex P1 round 13 (PR `#713`): writeBlob already routed through
// ensureDir/assertNoSymlinkAncestors, but TTL sidecars (appendTTL ->
// openJSONL) bypassed that guard because openJSONL only protected
// the final path element via openSidecarFile. A symlinked ancestor
// like `<outRoot>/redis/db_0 -> /tmp/outside` would then redirect
// strings_ttl.jsonl writes outside the dump root. appendTTL now
// calls ensureDir on the parent directory before opening the
// sidecar.
func TestRedisDB_RefusesSymlinkedAncestorOnTTLSidecarViaHandleTTL(t *testing.T) {
t.Parallel()
root := t.TempDir()
bait := filepath.Join(root, "bait-tree")
if err := os.MkdirAll(bait, 0o755); err != nil {
t.Fatal(err)
}
db := NewRedisDB(root, 0)
// First, let HandleHLL succeed so the blob path is established clean.
if err := db.HandleHLL([]byte("hll-k"), []byte{0x01}); err != nil {
t.Fatal(err)
}
// Adversary replaces db_0 with a symlink after blobs are already written.
dbDir := filepath.Join(root, "redis", "db_0")
if err := os.RemoveAll(dbDir); err != nil {
t.Fatal(err)
}
if err := os.Symlink(bait, dbDir); err != nil {
t.Fatal(err)
}
// HandleTTL drives appendTTL → ensureDir on db_0 → detects symlink.
err := db.HandleTTL([]byte("hll-k"), encodeTTLValue(fixedExpireMs))
if err == nil || !strings.Contains(err.Error(), "refusing to traverse symlinked ancestor") {
t.Fatalf("expected symlinked-ancestor refusal, got %v", err)
}
if _, statErr := os.Stat(filepath.Join(bait, "strings_ttl.jsonl")); !os.IsNotExist(statErr) {
t.Fatalf("TTL sidecar written through ancestor symlink via HandleTTL: stat err=%v", statErr)
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backup/redis_string_test.go` around lines 241 - 276, The test
currently places a symlink at db_0 which causes the blob-path ensureDir (invoked
by HandleString) to fail early, so appendTTL isn't exercised; change the setup
so the symlink exists only on the TTL sidecar path but not the blob path.
Concretely: create a real db_0/strings directory first (e.g. call db.HandleHLL
or a blob write to create the blob dir), then remove/replace db_0 with the
symlink to bait, then invoke the TTL-writing code path (e.g. db.HandleTTL or
db.HandleString with a TTL-bearing value so appendTTL is executed) and assert
the call errors with "refusing to traverse symlinked ancestor" and that
bait/strings_ttl.jsonl is not created; references: NewRedisDB, HandleString,
HandleHLL, appendTTL, ensureDir, openJSONL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant