Skip to content

Commit ec358ab

Browse files
committed
backup: PR810 r4 — codex r4 P2 + gemini r1 mediums
Three remaining reviewer findings on PR #810 (one P2, two mediums) that the prior rounds did not address. None is a fail-closed semantic change; together they tighten the CHECKSUMS on-disk format and the cmd's manifest write path against operational shapes the design promises to support. 1. (codex r4 P2) WriteChecksums wrote filenames verbatim into a newline-delimited file. A path containing `\n`, `\r`, or `\\` would corrupt the line-based output and let a single adversarial entry inject a second CHECKSUMS row that the verifier would accept. Fix: formatChecksumLine implements the GNU coreutils sha256sum(1) escape convention — leading `\` on the line plus `\n` / `\r` / `\\` substitutions. splitChecksumLine reverses the convention via unescapeChecksumPath at parse time. `sha256sum -c CHECKSUMS` from the dump root parses both forms identically, preserving the vendor-independent recovery property. Tests: - TestChecksumLine_EscapeRoundTrip — five paths covering `\n`, `\r`, `\\`, multiple escapes mixed, and the no-escape passthrough. - TestChecksumLine_EscapePrefixMatchesGNUFormat — pins the exact `\` + hex + ` ` + escaped-body shape so a future change cannot drift from sha256sum compatibility. - TestSplitChecksumLine_RejectsDanglingEscape — guards against a tampered CHECKSUMS using `\<EOL>` or `\x` for unknown `x`. 2. (gemini r1 medium, main.go:281) emitManifest concatenated `cfg.outputRoot + "/MANIFEST.json"`. On Windows this produces wrong path separators and would refuse a Windows-style output root. Fix: filepath.Join. Standard cross-platform shape. 3. (gemini r1 medium, main.go:285) emitManifest deferred a `_ = out.Close()` that swallowed the Close error. On NFS / FUSE filesystems Close is the authoritative durability signal (writeback errors surface there, not at Write time), so a silently-discarded Close error means the dump-tree invariant (MANIFEST.json on disk) can fail without the cmd surfacing it. Fix: explicit Sync() then Close() with both errors propagated; the defer'd close on error paths runs through `_ = out.Close()` because we have already surfaced the primary failure. Caller audit: - formatChecksumLine / unescapeChecksumPath are new helpers called only from WriteChecksums / splitChecksumLine. No cross-package use. - splitChecksumLine's exported behavior is rejection-side tightening (dangling escape) plus correct round-trip of escape-needing paths. No production caller of VerifyChecksums exists (test-only). - emitManifest is internal to cmd/elastickv-snapshot-decode. The Close-error surface is purely additive — previously- successful runs stay successful. Self-review: 1. Data loss — none; the new escape path PREVENTS data loss (line injection); the Close-error surface PREVENTS silent manifest-write loss on NFS. 2. Concurrency — none. 3. Performance — single Builder per escape-needing line; the no-escape fast path is unchanged. 4. Data consistency — sha256sum(1) escape round-trip is byte-identical (round-trip test pins it). 5. Test coverage — three new tests; existing seven CHECKSUMS tests still pass.
1 parent b05ec87 commit ec358ab

3 files changed

Lines changed: 191 additions & 8 deletions

File tree

cmd/elastickv-snapshot-decode/main.go

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import (
2222
"io"
2323
"log/slog"
2424
"os"
25+
"path/filepath"
2526
"strings"
2627
"time"
2728

@@ -278,15 +279,35 @@ func emitManifest(cfg *config, res backup.DecodeResult) error {
278279
if cfg.bundleJSONL {
279280
m.DynamoDBLayout = backup.DynamoDBLayoutJSONL
280281
}
281-
out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path
282+
// filepath.Join over `+ "/"` so the manifest path uses the
283+
// platform separator (gemini r1 medium on PR #810).
284+
out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json")) //nolint:gosec // operator-supplied path
282285
if err != nil {
283286
return errors.WithStack(err)
284287
}
285-
defer func() { _ = out.Close() }()
288+
// Surface Close errors instead of swallowing them: on a
289+
// filesystem that delays writeback until Close (NFS, some
290+
// FUSE mounts), the write succeeds but the buffered-data
291+
// flush failure shows up only at Close time. Without this
292+
// surface, a manifest that was never durably written would
293+
// pass the cmd's error-return contract and the dump-tree
294+
// invariant ("MANIFEST.json is on disk") would silently
295+
// fail. Gemini r1 medium on PR #810. Sync runs before Close
296+
// so the explicit fsync is the authoritative durability
297+
// signal; Close's role is just to surface the kernel's
298+
// per-fd cleanup result.
286299
if err := backup.WriteManifest(out, m); err != nil {
300+
_ = out.Close()
287301
return errors.WithStack(err)
288302
}
289-
return errors.WithStack(out.Sync())
303+
if err := out.Sync(); err != nil {
304+
_ = out.Close()
305+
return errors.WithStack(err)
306+
}
307+
if err := out.Close(); err != nil {
308+
return errors.WithStack(err)
309+
}
310+
return nil
290311
}
291312

292313
// populateAdapterScopes returns a non-nil pointer for every enabled

internal/backup/checksums.go

Lines changed: 90 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,13 +74,48 @@ func WriteChecksums(root string) error {
7474
}
7575
defer func() { _ = out.Close() }()
7676
for _, e := range entries {
77-
if _, err := fmt.Fprintf(out, "%s %s\n", e.hex, e.relPath); err != nil {
77+
line, prefix := formatChecksumLine(e.hex, e.relPath)
78+
if _, err := fmt.Fprintf(out, "%s%s\n", prefix, line); err != nil {
7879
return errors.WithStack(err)
7980
}
8081
}
8182
return errors.WithStack(out.Sync())
8283
}
8384

85+
// formatChecksumLine builds one CHECKSUMS row using GNU coreutils
86+
// sha256sum(1) escape conventions. When the path contains `\n`,
87+
// `\r`, or `\\`, sha256sum prefixes the entire line with `\\` and
88+
// replaces the offending bytes with `\\n` / `\\r` / `\\\\`. Phase 0a
89+
// emits the same shape so `sha256sum -c CHECKSUMS` from the dump
90+
// root parses a tampered or path-noisy filename without corrupting
91+
// the line-based output (codex r4 P2 on PR #810: the previous
92+
// `%s %s\n` print would let a filename containing `\n` inject a
93+
// fake CHECKSUMS row).
94+
//
95+
// Returns (line-body, prefix). The prefix is either "" or "\\";
96+
// callers concatenate `prefix + body + "\n"` exactly like the
97+
// reference implementation.
98+
func formatChecksumLine(hex, relPath string) (string, string) {
99+
if !strings.ContainsAny(relPath, "\n\r\\") {
100+
return hex + " " + relPath, ""
101+
}
102+
var sb strings.Builder
103+
sb.Grow(len(relPath) + 4) //nolint:mnd // small heuristic to avoid early grow
104+
for _, r := range relPath {
105+
switch r {
106+
case '\n':
107+
sb.WriteString(`\n`)
108+
case '\r':
109+
sb.WriteString(`\r`)
110+
case '\\':
111+
sb.WriteString(`\\`)
112+
default:
113+
sb.WriteRune(r)
114+
}
115+
}
116+
return hex + " " + sb.String(), `\`
117+
}
118+
84119
// checksumEntry is one row of the CHECKSUMS file.
85120
type checksumEntry struct {
86121
hex string
@@ -313,10 +348,22 @@ func validateChecksumRelPath(rel string) (string, error) {
313348
var ErrChecksumsPathTraversal = errors.New("backup: CHECKSUMS path escapes dump root")
314349

315350
// splitChecksumLine parses one sha256sum(1) line of the form
316-
// `<hex> <relative-path>`. Returns (hex, path, ok). Lines with the
317-
// wrong shape are reported as not-ok so the caller can surface a
318-
// CHECKSUMS-corruption error rather than ingest garbage.
351+
// `<hex> <relative-path>` (or its `\\`-prefixed escaped variant
352+
// when the path contains `\n`/`\r`/`\\`). Returns (hex, path, ok).
353+
// Lines with the wrong shape are reported as not-ok so the
354+
// caller can surface a CHECKSUMS-corruption error rather than
355+
// ingest garbage.
356+
//
357+
// Escape convention mirrors GNU coreutils: a leading `\\` signals
358+
// that the rest of the path has `\n`/`\r`/`\\` written as the
359+
// two-byte sequences `\\n`, `\\r`, `\\\\`. Codex r4 P2 on PR #810:
360+
// without this, a filename containing `\n` would split across
361+
// lines and a verifier would see a bogus second row.
319362
func splitChecksumLine(line string) (string, string, bool) {
363+
escaped := strings.HasPrefix(line, `\`)
364+
if escaped {
365+
line = line[1:]
366+
}
320367
// sha256sum's "binary" mode separates digest and filename with
321368
// exactly two spaces. Some tools emit `<hex> *<path>` for
322369
// binary mode; we accept both.
@@ -332,7 +379,45 @@ func splitChecksumLine(line string) (string, string, bool) {
332379
if !strings.HasPrefix(rest, " ") && !strings.HasPrefix(rest, " *") {
333380
return "", "", false
334381
}
335-
return hexPart, rest[2:], true
382+
body := rest[2:]
383+
if !escaped {
384+
return hexPart, body, true
385+
}
386+
unescaped, ok := unescapeChecksumPath(body)
387+
if !ok {
388+
return "", "", false
389+
}
390+
return hexPart, unescaped, true
391+
}
392+
393+
// unescapeChecksumPath reverses formatChecksumLine's `\n`/`\r`/`\\`
394+
// substitutions. Returns (path, ok). An unrecognised `\<byte>` or
395+
// a dangling trailing `\` is treated as malformed — the line is
396+
// rejected at the caller via the ok=false return.
397+
func unescapeChecksumPath(body string) (string, bool) {
398+
var sb strings.Builder
399+
sb.Grow(len(body))
400+
for i := 0; i < len(body); i++ {
401+
if body[i] != '\\' {
402+
sb.WriteByte(body[i])
403+
continue
404+
}
405+
i++
406+
if i >= len(body) {
407+
return "", false
408+
}
409+
switch body[i] {
410+
case 'n':
411+
sb.WriteByte('\n')
412+
case 'r':
413+
sb.WriteByte('\r')
414+
case '\\':
415+
sb.WriteByte('\\')
416+
default:
417+
return "", false
418+
}
419+
}
420+
return sb.String(), true
336421
}
337422

338423
func isLowerHex(s string) bool {

internal/backup/checksums_test.go

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,83 @@ func TestValidateChecksumRelPath_AcceptsHonestPaths(t *testing.T) {
308308
}
309309
}
310310

311+
// TestChecksumLine_EscapeRoundTrip is the regression for codex r4
312+
// P2 on PR #810: a path containing `\n`/`\r`/`\\` would corrupt
313+
// the line-delimited CHECKSUMS without GNU coreutils-style
314+
// escaping. WriteChecksums prefixes such lines with `\\` and
315+
// substitutes `\n`/`\r`/`\\` → `\\n`/`\\r`/`\\\\`; VerifyChecksums
316+
// reverses the substitution at parse time. Round-trip pins both
317+
// halves together.
318+
func TestChecksumLine_EscapeRoundTrip(t *testing.T) {
319+
t.Parallel()
320+
cases := []struct {
321+
name string
322+
path string
323+
}{
324+
{"newline", "weird\nfile"},
325+
{"carriage_return", "weird\rfile"},
326+
{"backslash", `weird\file`},
327+
{"multiple_escapes", "a\n\rb\\c"},
328+
{"no_escape_pass_through", "honest/path.txt"},
329+
}
330+
for _, tc := range cases {
331+
t.Run(tc.name, func(t *testing.T) {
332+
t.Parallel()
333+
hex := "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
334+
body, prefix := formatChecksumLine(hex, tc.path)
335+
gotHex, gotPath, ok := splitChecksumLine(prefix + body)
336+
if !ok {
337+
t.Fatalf("splitChecksumLine(%q) rejected its own output", prefix+body)
338+
}
339+
if gotHex != hex {
340+
t.Fatalf("hex = %q, want %q", gotHex, hex)
341+
}
342+
if gotPath != tc.path {
343+
t.Fatalf("path = %q, want %q", gotPath, tc.path)
344+
}
345+
})
346+
}
347+
}
348+
349+
// TestChecksumLine_EscapePrefixMatchesGNUFormat pins the exact
350+
// shape WriteChecksums emits for escape-needing paths:
351+
//
352+
// \<hex> <body-with-\n-\r-\\\>
353+
//
354+
// so `sha256sum -c CHECKSUMS` on the dump root parses the line
355+
// the same way GNU coreutils does.
356+
func TestChecksumLine_EscapePrefixMatchesGNUFormat(t *testing.T) {
357+
t.Parallel()
358+
hex := "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
359+
body, prefix := formatChecksumLine(hex, "x\ny")
360+
if prefix != `\` {
361+
t.Fatalf("escape prefix = %q, want %q", prefix, `\`)
362+
}
363+
wantBody := hex + ` x\ny`
364+
if body != wantBody {
365+
t.Fatalf("body = %q, want %q", body, wantBody)
366+
}
367+
}
368+
369+
// TestSplitChecksumLine_RejectsDanglingEscape pins the parse-side
370+
// guard against malformed escape sequences (`\` at end-of-line, or
371+
// `\x` for unknown `x`). Catches a tampered CHECKSUMS whose
372+
// escape-prefixed line is otherwise structurally valid.
373+
func TestSplitChecksumLine_RejectsDanglingEscape(t *testing.T) {
374+
t.Parallel()
375+
hex := "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
376+
cases := []string{
377+
`\` + hex + ` trailing\`, // dangling \ at end
378+
`\` + hex + ` bad\xescape`, // unknown escape \x
379+
}
380+
for _, c := range cases {
381+
_, _, ok := splitChecksumLine(c)
382+
if ok {
383+
t.Fatalf("splitChecksumLine(%q) accepted malformed escape", c)
384+
}
385+
}
386+
}
387+
311388
// TestVerifyChecksums_StreamsLargeFile pins the bufio.Scanner path
312389
// — the previous implementation slurped the entire CHECKSUMS file
313390
// via os.ReadFile, which OOMs on large dump trees (gemini

0 commit comments

Comments
 (0)