Skip to content

Commit 145a912

Browse files
committed
backup: #904 v9 - stat InputRoot + fix godoc attribution
Two findings on v8. ## Codex P2 v8: silent empty-restore on missing/stale InputRoot validateEncodeOptions only rejected the empty string. A library caller that passed a typo'd or deleted InputRoot would fan-out into every enabled adapter, each treating its missing top-level subdirectory as a no-op. Result: a "successful" header-only .fsm — exactly the silent empty-restore artifact the encoder is supposed to fail closed against. CLI callers don't hit this path (they open MANIFEST.json under InputRoot first), but the library entrypoint is now a real surface, so the guard belongs in EncodeSnapshot itself. Fix: validateEncodeOptions now os.Stats InputRoot after the empty- string check, rejecting non-existent paths (wraps the stat error) and regular files (returns a typed error). Both rejection paths run before any byte is written and before any adapter is invoked. Pinned by TestEncodeSnapshotRejectsMissingInputRoot (subtests: non-existent path; regular file). Existing TestEncodeSnapshot LibraryRoundTrip + the manifest-floor tests use t.TempDir(), which is a directory, so they continue to pass. ## Claude v8 doc bug: EncodeSnapshot godoc attributed to wrong symbol The EncodeSnapshot doc block sat directly above the validateEncodeOptions doc block with no blank line between them, so godoc tooling (and gopls) attributed the entire merged comment to validateEncodeOptions. EncodeSnapshot — the only exported function in this file — had no godoc at all. Fix: relocate the EncodeSnapshot doc block to immediately precede `func EncodeSnapshot`, separated from validateEncodeOptionsData by a blank line. Content unchanged; only the location moved. ## Caller audit per CLAUDE.md semantic-change rule - validateEncodeOptions gained a fail-closed pre-condition (stat + IsDir). All in-tree callers pass either t.TempDir() (directory) or a real backup root (directory). No legitimate caller is impacted; bad-input callers now surface a typed error instead of silently producing a header-only .fsm — strictly safer. - No exported signature or error semantics changed beyond the new rejection path. Tests + lint green.
1 parent 72fb54d commit 145a912

2 files changed

Lines changed: 94 additions & 33 deletions

File tree

internal/backup/encode_snapshot.go

Lines changed: 46 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -147,44 +147,28 @@ type EncodeResult struct {
147147
AdaptersEnabled []string
148148
}
149149

150-
// EncodeSnapshot reads the directory tree at opts.InputRoot, invokes the
151-
// enabled per-adapter encoders in canonical fan-out order, optionally
152-
// runs the round-trip self-test, and writes the .fsm bytes to out.
153-
// The .fsm bytes are NOT returned; they go to out.
154-
//
155-
// When opts.SelfTest=false the FSM streams straight to out with a
156-
// sha256 tee and no extra buffering. When opts.SelfTest=true the FSM
157-
// is written to an on-disk temp file (encode-self-test-fsm-*) under
158-
// opts.SelfTestDecodeOptions.OutRoot, the file is streamed through
159-
// DecodeSnapshot, and bytes are copied to out ONLY if the decode
160-
// survives. Memory cost in self-test mode is O(1) on top of the
161-
// sort working set (gemini high #904 — the earlier *bytes.Buffer
162-
// version would OOM on multi-GB snapshots).
163-
//
164-
// Self-test failure returns (result, nil) with result.SelfTestMatched
165-
// == false and result.SelfTestMismatchTxt populated. Callers MUST
166-
// check result.SelfTestMatched before treating a nil error as success.
167-
// The CLI relies on this contract to write mismatch.txt + exit 2;
168-
// library callers should follow the same pattern.
169-
//
170-
// EncodeSnapshot does NOT read MANIFEST.json itself, but it WILL
171-
// enforce a floor on opts.LastCommitTS when the caller threads the
172-
// manifest value through opts.ManifestLastCommitTS — a low
173-
// LastCommitTS returns ErrSelfTestLowerLastCommitTS BEFORE any bytes
174-
// are written. The CLI's resolveLastCommitTS sets both fields to the
175-
// reconciled values, and library callers SHOULD do the same. The
176-
// check is opt-in (ManifestLastCommitTS=0 disables it) so synthetic
177-
// test fixtures without a manifest reference can still call this
178-
// directly (codex P2 v2 #904).
179150
// validateEncodeOptions enforces the four pre-encode invariants:
180-
// InputRoot/out non-nil, non-empty adapter selection, optional
181-
// manifest-TS floor, and DDB JSONL guard. Split out so EncodeSnapshot
182-
// stays under the cyclop threshold; per-check helpers below keep each
183-
// branch's intent narrow.
151+
// InputRoot non-empty + exists-as-directory, out non-nil, non-empty
152+
// adapter selection, optional manifest-TS floor, and DDB JSONL guard.
153+
// Split out so EncodeSnapshot stays under the cyclop threshold; the
154+
// data-correctness checks live in validateEncodeOptionsData.
184155
func validateEncodeOptions(opts EncodeOptions, out io.Writer) error {
185156
if opts.InputRoot == "" {
186157
return errors.New("backup: EncodeOptions.InputRoot is required")
187158
}
159+
// Stat the path so a typo'd or deleted directory surfaces here
160+
// rather than fan-out-no-op'ing every adapter and producing a
161+
// header-only .fsm (codex P2 v8 #904). CLI callers indirectly
162+
// catch this via os.Open(MANIFEST.json) before EncodeSnapshot,
163+
// but a library caller that passes a stale path needs the guard
164+
// at this layer.
165+
info, statErr := os.Stat(opts.InputRoot)
166+
if statErr != nil {
167+
return errors.Wrapf(statErr, "stat InputRoot %q", opts.InputRoot)
168+
}
169+
if !info.IsDir() {
170+
return errors.Errorf("backup: InputRoot %q is not a directory", opts.InputRoot)
171+
}
188172
if out == nil {
189173
return errors.New("backup: EncodeSnapshot out writer is nil")
190174
}
@@ -214,6 +198,35 @@ func validateEncodeOptionsData(opts EncodeOptions) error {
214198
return nil
215199
}
216200

201+
// EncodeSnapshot reads the directory tree at opts.InputRoot, invokes the
202+
// enabled per-adapter encoders in canonical fan-out order, optionally
203+
// runs the round-trip self-test, and writes the .fsm bytes to out.
204+
// The .fsm bytes are NOT returned; they go to out.
205+
//
206+
// When opts.SelfTest=false the FSM streams straight to out with a
207+
// sha256 tee and no extra buffering. When opts.SelfTest=true the FSM
208+
// is written to an on-disk temp file (encode-self-test-fsm-*) under
209+
// opts.SelfTestDecodeOptions.OutRoot, the file is streamed through
210+
// DecodeSnapshot, and bytes are copied to out ONLY if the decode
211+
// survives. Memory cost in self-test mode is O(1) on top of the
212+
// sort working set (gemini high #904 — the earlier *bytes.Buffer
213+
// version would OOM on multi-GB snapshots).
214+
//
215+
// Self-test failure returns (result, nil) with result.SelfTestMatched
216+
// == false and result.SelfTestMismatchTxt populated. Callers MUST
217+
// check result.SelfTestMatched before treating a nil error as success.
218+
// The CLI relies on this contract to write mismatch.txt + exit 2;
219+
// library callers should follow the same pattern.
220+
//
221+
// EncodeSnapshot does NOT read MANIFEST.json itself, but it WILL
222+
// enforce a floor on opts.LastCommitTS when the caller threads the
223+
// manifest value through opts.ManifestLastCommitTS — a low
224+
// LastCommitTS returns ErrSelfTestLowerLastCommitTS BEFORE any bytes
225+
// are written. The CLI's resolveLastCommitTS sets both fields to the
226+
// reconciled values, and library callers SHOULD do the same. The
227+
// check is opt-in (ManifestLastCommitTS=0 disables it) so synthetic
228+
// test fixtures without a manifest reference can still call this
229+
// directly (codex P2 v2 #904).
217230
func EncodeSnapshot(opts EncodeOptions, out io.Writer) (EncodeResult, error) {
218231
if err := validateEncodeOptions(opts, out); err != nil {
219232
return EncodeResult{}, err

internal/backup/encode_snapshot_test.go

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,54 @@ func TestEncodeSnapshotRequiresInputRoot(t *testing.T) {
211211
}
212212
}
213213

214+
// TestEncodeSnapshotRejectsMissingInputRoot pins codex P2 v8 #904: a
215+
// non-existent or non-directory InputRoot must be rejected before any
216+
// adapter runs. Otherwise each enabled adapter treats its missing
217+
// top-level subdirectory as a no-op, the call "succeeds", and the
218+
// caller gets a header-only .fsm — a silent empty-restore artifact.
219+
// CLI callers don't hit this path (they open MANIFEST.json first),
220+
// but library callers can pass a stale path, so the guard belongs in
221+
// EncodeSnapshot itself.
222+
func TestEncodeSnapshotRejectsMissingInputRoot(t *testing.T) {
223+
t.Parallel()
224+
t.Run("non-existent path", func(t *testing.T) {
225+
t.Parallel()
226+
missing := filepath.Join(t.TempDir(), "does-not-exist")
227+
var buf bytes.Buffer
228+
_, err := EncodeSnapshot(EncodeOptions{
229+
InputRoot: missing,
230+
Adapters: AdapterSet{SQS: true},
231+
LastCommitTS: 1,
232+
}, &buf)
233+
if err == nil {
234+
t.Fatalf("EncodeSnapshot with non-existent InputRoot succeeded; want error")
235+
}
236+
if buf.Len() != 0 {
237+
t.Errorf("buf.Len = %d, want 0 (no bytes should be written for missing InputRoot)", buf.Len())
238+
}
239+
})
240+
t.Run("regular file", func(t *testing.T) {
241+
t.Parallel()
242+
dir := t.TempDir()
243+
filePath := filepath.Join(dir, "not-a-dir")
244+
if err := os.WriteFile(filePath, []byte("x"), 0o600); err != nil {
245+
t.Fatalf("WriteFile: %v", err)
246+
}
247+
var buf bytes.Buffer
248+
_, err := EncodeSnapshot(EncodeOptions{
249+
InputRoot: filePath,
250+
Adapters: AdapterSet{SQS: true},
251+
LastCommitTS: 1,
252+
}, &buf)
253+
if err == nil {
254+
t.Fatalf("EncodeSnapshot with file-as-InputRoot succeeded; want error")
255+
}
256+
if buf.Len() != 0 {
257+
t.Errorf("buf.Len = %d, want 0 (no bytes should be written for non-directory InputRoot)", buf.Len())
258+
}
259+
})
260+
}
261+
214262
// TestEncodeSnapshotRejectsLowManifestFloor pins codex P2 v2: the
215263
// library-level HLC floor check fails-closed when opts.LastCommitTS
216264
// is below opts.ManifestLastCommitTS. Defense-in-depth for the CLI's

0 commit comments

Comments
 (0)