Skip to content

fix(vault): accept and persist content on POST/PUT vault/documents#1174

Merged
thieung merged 4 commits into
nextlevelbuilder:devfrom
aaron-tsar:fix/vault-doc-content-on-create
May 28, 2026
Merged

fix(vault): accept and persist content on POST/PUT vault/documents#1174
thieung merged 4 commits into
nextlevelbuilder:devfrom
aaron-tsar:fix/vault-doc-content-on-create

Conversation

@aaron-tsar
Copy link
Copy Markdown
Contributor

Summary

POST /v1/agents/{id}/vault/documents (and its tenant-wide twin) declared a body struct without a content field, so JSON payloads carrying content got HTTP 201 with a doc id but the bytes were silently discarded — content_hash stayed empty, no file was written, and the enrichment pipeline never fired.

This makes content an optional field on both POST and PUT. When present, the bytes land on disk under <tenant-workspace>/<path>, a SHA-256 hash is stored on the row, and an EventVaultDocUpserted is published — the same downstream behaviour as POST /v1/vault/upload, but available from JSON callers that don't want to bother with multipart.

Omitting content preserves the historical metadata-only behaviour, so existing callers (e.g. those that materialise files via filesystem rescan) keep working.

Why

Programmatic clients that already have UTF-8 text in memory — CI jobs, IDE agent skills, MCP tools — have no clean way to populate the vault today:

  • POST /v1/{agents/{id}}/vault/documents accepts JSON but drops the content.
  • POST /v1/vault/upload accepts multipart but flattens paths to basename and is awkward to build from a non-form HTTP client.
  • Filesystem rescan works, but assumes callers can write to the goclaw server's filesystem directly (often impractical when the client is remote).

I hit this writing a per-repo "ingest" tool for Claude Code skills against a self-hosted goclaw — markdown files were registered but every doc was an empty shell.

Changes

  • internal/http/vault_handler_documents.go
    • handleCreateDocument: add Content string to the body struct; when set, validate extension against the existing allowedUploadExts whitelist, write to disk inside the tenant workspace (with abs-path escape check), set ContentHash, and publish EventVaultDocUpserted after the DB upsert so the enrichment worker can locate the row.
    • handleUpdateDocument: add Content *string (nil = no change). Same on-disk write/hash/event flow when set.
    • Two small helpers (writeDocumentContent, publishDocUpserted) so both handlers share the implementation.

No store-layer, schema, or event-bus changes. No new dependencies.

Backwards compatibility

100% — callers that don't send content get the exact prior behaviour. Existing metadata-only rows on disk remain valid; a subsequent PUT with content materialises them.

Test plan

  • go build ./... clean
  • go test ./internal/http/ passes
  • Live test against a dev instance:
    • POST with content → DB row inserted, file at <workspace>/<path>, SHA-256 matches request body, content_hash populated.
    • PUT with content → file rewritten, content_hash updated.
    • POST without content → unchanged behaviour: empty content_hash, no file on disk.
    • Path escape attempts (../.., absolute, symlink target outside workspace) → rejected with HTTP 400.
  • Unit tests for the new helpers — happy to add in a follow-up if reviewers prefer them in this PR.

🤖 Generated with Claude Code

@thieung
Copy link
Copy Markdown
Contributor

thieung commented May 25, 2026

@clark-cant Review this PR

@clark-cant
Copy link
Copy Markdown

PR Review

Verdict: REQUEST CHANGES

Summary

This fixes a real gap: JSON POST/PUT vault/documents can now persist inline content to disk, set content_hash, and trigger vault enrichment. Scope is small and generally aligned with existing upload behavior.

Risk level: medium-high because this introduces direct filesystem writes from JSON input.

Findings

  1. [HIGH] internal/http/vault_handler_documents.gowriteDocumentContent does not actually protect against symlink escape.

The current check only verifies the lexical absolute path:

target := filepath.Join(workspace, filepath.Clean(relPath))
absTarget, _ := filepath.Abs(target)
absWS, _ := filepath.Abs(workspace)
strings.HasPrefix(absTarget, absWS + string(os.PathSeparator))

This blocks ../ and absolute-path escapes, but it does not resolve symlinks. If the tenant workspace contains a symlink such as workspace/link -> /tmp/outside, then writing link/pwn.md still passes the prefix check while os.WriteFile follows the symlink and writes outside the workspace.

The PR body says symlink targets outside the workspace are rejected, but the implementation does not enforce that yet.

Please resolve symlinks for the parent/target path before writing, or use a no-symlink-safe file creation strategy. Add a regression test for a symlink inside the workspace pointing outside.

  1. [MEDIUM] internal/http/vault_handler_documents.go — JSON document writes publish enrichment events without starting enrichment progress.

handleUpload calls h.enrichProgress.Start(created, tenantID) before publishing EventVaultDocUpserted, with a comment noting this avoids a worker race. The new JSON path publishes the event directly through publishDocUpserted without starting progress.

If enrichment progress/UI depends on this, JSON content writes will not behave the same as multipart upload despite the PR describing the same downstream behavior. Please either mirror the upload progress flow or document why this path intentionally skips progress tracking.

  1. [LOW] Missing unit tests for the risky paths.

Please add tests for at least:

  • symlink escape rejection
  • invalid extension rejection
  • PUT with empty content
  • event emitted after DB upsert
  • DB upsert failure after file write behavior, if orphan files are acceptable or need cleanup

Anti-AI-Slop Check

  • Intent clear: yes
  • Diff scope justified: yes
  • Refactor justified: yes
  • Author explanation needed: no

Tests / CI

  • CI status: no check runs reported; commit status is pending.
  • The PR describes live testing, but the security-sensitive filesystem behavior needs automated regression coverage.

Final Notes

Good fix direction, but filesystem writes need to be airtight before merge. The symlink escape issue is blocking.

aaron-tsar added a commit to aaron-tsar/goclaw that referenced this pull request May 27, 2026
Addresses review on nextlevelbuilder#1174:

- writeDocumentContent now resolves symlinks before writing. A symlink
  living inside the workspace but pointing outside (e.g. workspace/link
  -> /etc) previously passed the lexical prefix check and os.WriteFile
  followed it out of bounds. We now EvalSymlinks the workspace root and
  the deepest existing ancestor of the target, re-check containment
  before any mkdir/write, and refuse a symlink planted at the final
  component via Lstat.
- publishDocUpserted starts enrichment progress (enrichProgress.Start)
  before publishing EventVaultDocUpserted, matching handleUpload's
  ordering so the worker pool can't drain the event before the progress
  tracker registers it.
- Add regression tests: symlink dir escape, final-component symlink,
  parent escape, empty content, disallowed extension, event-after-upsert
  ordering, metadata-only no-write, and upsert-failure orphan behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aaron-tsar
Copy link
Copy Markdown
Contributor Author

Thanks @clark-cant — addressed all three in 9349da8.

[HIGH] Symlink escapewriteDocumentContent no longer trusts the lexical prefix check alone. It now:

  • EvalSymlinks the workspace root so containment compares canonical paths;
  • resolves the deepest already-existing ancestor of the target and re-checks containment before any MkdirAll/write, so a workspace/link -> /etc symlink is rejected before a byte is written (and before the dir is created);
  • Lstats the final component and refuses if it is itself a symlink, so an attacker can't redirect the write onto an existing outside file.

Regression tests cover the symlinked-directory escape and the final-component-symlink case (plus ../ escape).

[MEDIUM] Progress vs. publish racepublishDocUpserted now calls enrichProgress.Start(1, tenantID) before eventBus.Publish, matching handleUpload's ordering so the worker pool can't drain the event ahead of the progress tracker. Added a test asserting the event fires after the DB upsert.

[LOW] Tests — new vault_handler_documents_test.go covers: symlink escape (dir + final component), parent escape, empty content, disallowed-extension rejection, event-emitted-after-upsert ordering, metadata-only no-write, and the write-then-upsert-fail orphan behaviour (documented as intentional: file is left in place, no event fired).

Builds clean on both go build ./internal/http/ and -tags sqliteonly; full internal/http package tests pass.

Copy link
Copy Markdown
Contributor

@thieung thieung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this — the dropped content field is real and the fix matches the upload path closely. A few blockers and hardening gaps before this is ready to merge; inline notes below.

Blocking

B1 — non-master tenants get HTTP 500 on first content write. resolveTenantWorkspace returns workspace/tenants/{slug}/ for any non-master tenant (internal/config/tenant_paths.go:38–53), and that directory is only created on demand by handleUpload/handleRescan. filepath.EvalSymlinks on a missing path returns os.ErrNotExist, so the very first JSON POST/PUT with content for a fresh tenant 500s. All tests use the master tenant, where TenantWorkspace returns workspace unchanged, so the regression isn't covered. See inline at line 28.

Important

  • I1 TOCTOU between Lstat and WriteFile — PR is sold as symlink hardening, but the final write still has no O_NOFOLLOW. Inline at line 72.
  • I3 Rejected escape attempts don't emit a security.* slog event — peer surface (internal/http/storage.go:158, :175) does. Inline at line 36.
  • I4 POST/PUT content semantics are asymmetric (string vs *string, "" means "absent" on POST but "clear file + fire event" on PUT). Inline at line 219.

Test gaps

  • No non-master tenant test (the B1 hotspot).
  • No test pinning PUT Content = nil → no write, no event.
  • No test for handler-level .. rejection at :232.

The core change (hash + enrichment ordering + lexical containment) is correct and well-tested for the master-tenant path — B1 is mainly about closing the multi-tenant gap before merge.

func (h *VaultHandler) writeDocumentContent(workspace, relPath string, content []byte) (string, error) {
// Resolve the workspace root through any symlinks so all containment checks
// compare canonical paths.
realWS, err := filepath.EvalSymlinks(workspace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking (B1). EvalSymlinks returns os.ErrNotExist for a path that doesn't exist yet, and for non-master tenants workspace here is workspace/tenants/{slug}/ which is only created on demand by handleUpload (vault_handler_upload.go:109 does os.MkdirAll(...) before writing). On a fresh non-master tenant whose first vault interaction is this endpoint, the call 500s with failed to write content.

Suggested fix: either os.MkdirAll(workspace, 0o755) before EvalSymlinks, or fall through to lexical containment on errors.Is(err, os.ErrNotExist). Either way, please add a regression test with a non-master tenant UUID — current tests all use masterTenant where TenantWorkspace returns workspace unchanged, so this path isn't exercised.

return "", os.ErrInvalid
}

if err := os.WriteFile(target, content, 0o644); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important (I1). The Lstat check above followed by os.WriteFile is a TOCTOU window: os.WriteFile opens with O_WRONLY|O_CREATE|O_TRUNC (no O_NOFOLLOW), so a tenant who can race the request can swap a regular file for a symlink between the two calls. Same race exists in handleUpload (vault_handler_upload.go:164 uses os.Create), so it isn't a new bug — but since this PR explicitly bills itself as symlink-containment hardening, it's worth closing here.

Suggestion: replace this os.WriteFile with os.OpenFile(dst, os.O_WRONLY|os.O_CREATE|os.O_TRUNC|syscall.O_NOFOLLOW, 0o644) then write + close, or reuse the evalSymlinkOrClean + pathWithinDir helpers in internal/http/storage.go:131–148 which already handle this surface.


// Lexical containment: blocks "../" and absolute-path escapes.
if target != realWS && !strings.HasPrefix(target, realWS+string(os.PathSeparator)) {
return "", os.ErrInvalid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important (I3). This (and lines 49, 69) reject symlink/path-escape attempts by returning os.ErrInvalid, but the only resulting log line is the generic vault.create: write content failed at line 299 — oncall can't alert on attempted escapes. Compare with internal/http/storage.go:158 (slog.Warn("security.storage_symlink_escape", "resolved", ..., "base", ...)) and :175 — the codebase pattern is a security.* slog at the rejection site itself. Suggest adding slog.Warn("security.vault_symlink_escape", ...) with the resolved path + workspace at each of the three rejection points before returning.

var body struct {
Path string `json:"path"`
Title string `json:"title"`
Content string `json:"content"` // optional; when set, written to disk
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important (I4). POST uses Content string while PUT (line 351) uses Content *string. Result: on POST, content: "" and "content not provided" are indistinguishable (both skip the write); on PUT, content: "" is a destructive action — it writes a 0-byte file, sets content_hash = sha256(""), and fires EventVaultDocUpserted. A caller who copy-pastes a payload between the two endpoints will be surprised.

Suggest either: (a) make POST also *string and treat nil vs "" consistently with PUT, or (b) reject content: "" on PUT with a 400 (and use DELETE for clearing) so the API only writes when bytes were actually sent. Whichever you pick, please document it in the handler godoc — the comment at line 351 documents PUT's behavior but not the asymmetry.


// masterTenant resolves TenantWorkspace to h.workspace unchanged, keeping the
// content-write tests pointed at a single temp dir.
var masterTenant = uuid.MustParse("0193a5b0-7000-7000-8000-000000000001")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gap. Every handler-level test uses masterTenant, which is the one case where config.TenantWorkspace returns workspace unchanged (see internal/config/tenant_paths.go:39). That hides the B1 regression entirely. Please add at least one test with a non-master tenant UUID + tenant slug context (via store.WithTenantID + store.WithTenantSlug) covering the first content write into a fresh workspace/tenants/{slug}/ dir.

Also missing:

  • A test pinning PUT with Content = nil → no write, no event (locks the "no change" contract).
  • A handler-level test for path containing .. exercising the rejection at vault_handler_documents.go:232 (the inner writeDocumentContent parent-escape test doesn't cover the handler short-circuit).

@mrgoonie
Copy link
Copy Markdown
Contributor

Backlog review: high-value but not merge-ready yet. The bug is real: content in vault document POST/PUT was accepted but not persisted, and this PR adds path/symlink containment plus enrichment event coverage. Current blockers: PR is BLOCKED/no checks reported and targets main, so retarget/rebase onto dev and run CI first. One behavior to confirm while rebasing: create currently treats content:"" as omitted even though update supports empty content and writeDocumentContent has an empty-file test.

aaron-tsar added a commit to aaron-tsar/goclaw that referenced this pull request May 28, 2026
Addresses second review on nextlevelbuilder#1174 (thieung):

**B1 (blocking)** — first JSON content write into a fresh non-master
tenant returned 500 because workspace/tenants/{slug}/ doesn't exist
yet and EvalSymlinks fails with ErrNotExist. writeDocumentContent now
MkdirAlls the workspace before resolving (mirrors handleUpload). Added
a regression test that drives a non-master tenant context end-to-end
and asserts the file lands under workspace/tenants/{slug}/.

**I1** — replaced the Lstat→WriteFile sequence (TOCTOU window) with
os.OpenFile(O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW). The Lstat short-fail
stays as defense-in-depth + cleaner error logging. Portability handled
via build-tagged constants in vault_handler_documents_nofollow_{unix,
windows}.go — O_NOFOLLOW is no-op on Windows (desktop edition is
single-user with no untrusted tenants).

**I3** — every symlink/path-escape rejection now emits
slog.Warn("security.vault_symlink_escape", ...) with the rejection
site, resolved path and workspace, matching the storage.go pattern
so oncall can alert on attempted escapes.

**I4** — POST `content` field changed from `string` to `*string` to
match PUT. Nil pointer = "no write" (metadata-only stub); empty string
= "write a 0-byte file + fire event"; non-empty = write bytes. Updated
handler godoc to spell out the symmetry. Switched `werr == os.ErrInvalid`
to `errors.Is` per repo conventions.

**Tests added** — non-master tenant first content write (B1), PUT with
nil content (locks "no change" contract), handler-level ".." path
rejection, POST content:"" writes empty file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aaron-tsar
Copy link
Copy Markdown
Contributor Author

Thanks @thieung, all findings addressed in ba59e7b.

B1 (blocking)writeDocumentContent now os.MkdirAlls the workspace before EvalSymlinks, so the first JSON content write into a fresh non-master tenant no longer 500s. Added TestHandleCreateDocument_NonMasterTenantFirstContentWrite that drives a non-master tenant UUID + slug via store.WithTenantID+store.WithTenantSlug and asserts the file lands under workspace/tenants/{slug}/.

I1 (TOCTOU) — replaced os.WriteFile with os.OpenFile(O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW). The pre-open Lstat stays as defense-in-depth + clean security-log emission. Portability via two new build-tagged files:

  • vault_handler_documents_nofollow_unix.goconst oNoFollow = syscall.O_NOFOLLOW + isSymlinkLoopErr = errors.Is(err, syscall.ELOOP)
  • vault_handler_documents_nofollow_windows.gooNoFollow = 0, isSymlinkLoopErr always false (Windows has no O_NOFOLLOW; desktop edition is single-user so degradation is acceptable, with the Lstat as best-effort guard)

If the open returns ELOOP, the request is rejected with os.ErrInvalid + security.vault_symlink_escape (site open_nofollow).

I considered reusing evalSymlinkOrClean / pathWithinDir from internal/http/storage.go but the storage helpers don't currently use O_NOFOLLOW on the write either — porting the race-close to the upload path is its own follow-up. Happy to do that in a separate PR if you want; kept this one tight to the JSON document handler.

I3 — every rejection site now emits slog.Warn("security.vault_symlink_escape", "site", "<which check>", "resolved"|"target"|"attempted", ..., "workspace", realWS). Four sites total: lexical, ancestor, final_lstat, open_nofollow.

I4 — POST Content is now *string (was string), aligned with PUT. Updated handler godoc:

  • nil pointer = no write, no event (metadata-only stub)
  • "" = write a 0-byte file + fire event (was previously indistinguishable from absent on POST — that asymmetry is gone)
  • non-empty = write bytes

Also switched werr == os.ErrInvaliderrors.Is(werr, os.ErrInvalid) per the repo conventions in CLAUDE.md.

New tests (4):

  • TestHandleCreateDocument_NonMasterTenantFirstContentWrite — covers B1
  • TestHandleUpdateDocument_NilContentSkipsWriteAndEvent — locks the "no change" contract
  • TestHandleCreateDocument_RejectsParentTraversalPath — the handler short-circuit at vault_handler_documents.go:263
  • TestHandleCreateDocument_EmptyContentWritesEmptyFile — locks the I4 symmetry

Full internal/http package: 14 vault-doc tests pass, no regressions in the rest of the suite. Both go build ./internal/http/ and -tags sqliteonly are clean.


@mrgoonie — re: retargeting onto dev: upstream/main is not an ancestor of upstream/dev (main carries a hotfix chain that dev doesn't have yet), so a plain base-change in the GitHub UI would make the PR diff include those extras. Want me to rebase fix/vault-doc-content-on-create onto upstream/dev and force-push (clean two-commit diff) before retargeting? Let me know and I'll do it in the next round.

@thieung thieung dismissed their stale review May 28, 2026 13:40

All four blocking/important findings addressed in ba59e7b with full test coverage. Re-reviewing as approve.

thieung
thieung previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@thieung thieung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — addresses all four findings from the prior review with full test coverage:

  • B1 — before EvalSymlinks (vault_handler_documents.go:33); TestHandleCreateDocument_NonMasterTenantFirstContentWrite exercises the slug-resolved path end-to-end.
  • I1os.WriteFile replaced with O_WRONLY|O_CREATE|O_TRUNC|oNoFollow open via build-tagged oNoFollow constants (unix=syscall.O_NOFOLLOW, windows=0 with documented degradation). ELOOP mapped to os.ErrInvalid for clean 400.
  • I3 — Four slog.Warn("security.vault_symlink_escape", "site", ...) lines covering all rejection sites (lexical / ancestor / final_lstat / open_nofollow); verified live in test logs.
  • I4 — POST Content is now *string, symmetric with PUT; godoc enumerates the three states (nil / "" / non-empty); errors.Is swap done.

All tests pass on PG and sqliteonly builds; vet clean.

Three trivial non-blocking nits if you ever revisit:

  1. defer f.Close() at vault_handler_documents.go:91 drops close errors — wouldn't affect security, but a flushed-disk-full would still return success. Trivial fix.
  2. site=lexical slog label is technically a path traversal, not a symlink — cosmetic; the alert key still fits.
  3. mustJSON test helper duplicates json.Marshal in newJSONRequest — minor DRY.

Nice work on the build-tagged Windows fallback and the explicit godoc on the empty-string contract.

Copy link
Copy Markdown
Contributor

@thieung thieung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — ba59e7b0 addresses all four findings from the prior review with full test coverage:

  • B1os.MkdirAll(workspace, 0o755) before EvalSymlinks (vault_handler_documents.go:33); TestHandleCreateDocument_NonMasterTenantFirstContentWrite exercises the slug-resolved path end-to-end.
  • I1os.WriteFile replaced with O_WRONLY|O_CREATE|O_TRUNC|oNoFollow open via build-tagged oNoFollow constants (unix=syscall.O_NOFOLLOW, windows=0 with documented degradation). ELOOP mapped to os.ErrInvalid for clean 400.
  • I3 — Four slog.Warn("security.vault_symlink_escape", "site", ...) lines covering all rejection sites (lexical / ancestor / final_lstat / open_nofollow); verified live in test logs.
  • I4 — POST Content is now *string, symmetric with PUT; godoc enumerates the three states (nil / "" / non-empty); errors.Is swap done.

All tests pass on PG and sqliteonly builds; vet clean.

Three trivial non-blocking nits if you ever revisit:

  1. defer f.Close() at vault_handler_documents.go:91 drops close errors — wouldn't affect security, but a flushed-disk-full would still return success. Trivial fix.
  2. site=lexical slog label is technically a path traversal, not a symlink — cosmetic; the alert key still fits.
  3. mustJSON test helper duplicates json.Marshal in newJSONRequest — minor DRY.

Nice work on the build-tagged Windows fallback and the explicit godoc on the empty-string contract.

@thieung thieung dismissed their stale review May 28, 2026 13:41

Duplicate — shell-escape mishap. See review 4381446173 for the intended approve body.

@thieung
Copy link
Copy Markdown
Contributor

thieung commented May 28, 2026

Want me to rebase fix/vault-doc-content-on-create onto upstream/dev and force-push (clean two-commit diff) before retargeting?

Yes pls

aaron-tsar and others added 4 commits May 28, 2026 20:55
Previously POST /v1/{agents/{id}/}vault/documents and the matching PUT
endpoint declared a body struct without a `content` field, so callers
passing JSON `{"path": ..., "title": ..., "content": "..."}` got back
HTTP 201 with a doc id, but the bytes were silently discarded:

  - bindJSON ignored the `content` key (no matching field)
  - the document row was inserted with content_hash="" and no file on disk
  - the enrichment pipeline (summary, embeddings, links) was never
    triggered because no EventVaultDocUpserted was published

The endpoint was effectively a metadata-only create — there was no client
path that combined "register doc + supply bytes" via JSON. /v1/vault/upload
exists, but it requires multipart, flattens paths to basename, and is
inconvenient for programmatic clients that already have UTF-8 text in
memory (CI jobs, agent skills, MCP tools).

This change makes `content` an optional field on both handlers:

  - POST: when content is non-empty, validate the file extension against
    the same allowedUploadExts whitelist used by /v1/vault/upload, write
    bytes to <tenant-workspace>/<path>, compute SHA-256, store it as
    content_hash on the doc, and publish EventVaultDocUpserted so the
    enrichment worker picks it up — exactly the same downstream
    behaviour as /v1/vault/upload.
  - PUT: same logic; nil content = no change, non-nil content = rewrite
    the file on disk (and an empty string truncates it).
  - Omitting `content` preserves the historical metadata-only behaviour
    (e.g. for callers that materialise files via filesystem rescan).
  - Path validation reuses the existing rules: no `..`, no absolute
    paths; an extra abs-path check rejects symlink escapes from the
    workspace root.

Two small helpers (writeDocumentContent, publishDocUpserted) factor out
the disk write + event publish so both handlers share the same code,
and so they can be unit-tested in isolation later.

The fix is backwards-compatible: existing callers that don't send
`content` continue to work unchanged. Existing docs created without
content remain metadata-only stubs; a subsequent PUT with `content`
will materialise them.

Verified live on a development instance:
  POST → row inserted, file at <workspace>/<path>, SHA-256 matches,
         enrichment event published.
  PUT  → file rewritten, content_hash updated, updated_at refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review on nextlevelbuilder#1174:

- writeDocumentContent now resolves symlinks before writing. A symlink
  living inside the workspace but pointing outside (e.g. workspace/link
  -> /etc) previously passed the lexical prefix check and os.WriteFile
  followed it out of bounds. We now EvalSymlinks the workspace root and
  the deepest existing ancestor of the target, re-check containment
  before any mkdir/write, and refuse a symlink planted at the final
  component via Lstat.
- publishDocUpserted starts enrichment progress (enrichProgress.Start)
  before publishing EventVaultDocUpserted, matching handleUpload's
  ordering so the worker pool can't drain the event before the progress
  tracker registers it.
- Add regression tests: symlink dir escape, final-component symlink,
  parent escape, empty content, disallowed extension, event-after-upsert
  ordering, metadata-only no-write, and upsert-failure orphan behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses second review on nextlevelbuilder#1174 (thieung):

**B1 (blocking)** — first JSON content write into a fresh non-master
tenant returned 500 because workspace/tenants/{slug}/ doesn't exist
yet and EvalSymlinks fails with ErrNotExist. writeDocumentContent now
MkdirAlls the workspace before resolving (mirrors handleUpload). Added
a regression test that drives a non-master tenant context end-to-end
and asserts the file lands under workspace/tenants/{slug}/.

**I1** — replaced the Lstat→WriteFile sequence (TOCTOU window) with
os.OpenFile(O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW). The Lstat short-fail
stays as defense-in-depth + cleaner error logging. Portability handled
via build-tagged constants in vault_handler_documents_nofollow_{unix,
windows}.go — O_NOFOLLOW is no-op on Windows (desktop edition is
single-user with no untrusted tenants).

**I3** — every symlink/path-escape rejection now emits
slog.Warn("security.vault_symlink_escape", ...) with the rejection
site, resolved path and workspace, matching the storage.go pattern
so oncall can alert on attempted escapes.

**I4** — POST `content` field changed from `string` to `*string` to
match PUT. Nil pointer = "no write" (metadata-only stub); empty string
= "write a 0-byte file + fire event"; non-empty = write bytes. Updated
handler godoc to spell out the symmetry. Switched `werr == os.ErrInvalid`
to `errors.Is` per repo conventions.

**Tests added** — non-master tenant first content write (B1), PUT with
nil content (locks "no change" contract), handler-level ".." path
rejection, POST content:"" writes empty file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trivial nits from thieung's review of ba59e7b:

- writeDocumentContent now closes the file explicitly and propagates
  any Close() error so a delayed FS failure (e.g. disk-full at flush)
  doesn't return success.
- Renamed slog `site=lexical` to `site=path_traversal` — the lexical
  prefix check catches "../" / absolute-path escapes, not symlinks.
  Umbrella key `security.vault_symlink_escape` stays.
- Non-master-tenant test now reuses newJSONRequest + WithContext,
  dropping the duplicate mustJSON helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aaron-tsar aaron-tsar force-pushed the fix/vault-doc-content-on-create branch from ba59e7b to 9edcc00 Compare May 28, 2026 13:56
@aaron-tsar aaron-tsar changed the base branch from main to dev May 28, 2026 13:56
@aaron-tsar
Copy link
Copy Markdown
Contributor Author

Done — rebased onto upstream/dev, force-pushed, base retargeted from maindev.

Folded the three nits in as 9edcc00d:

  1. writeDocumentContent now Close()s the file explicitly and returns the close error, so a delayed FS failure (disk-full at flush) no longer silently returns success.
  2. Renamed slog site=lexicalsite=path_traversal (umbrella key security.vault_symlink_escape stays — the lexical check catches ../ / absolute paths, not symlinks).
  3. Non-master-tenant test now reuses newJSONRequest + WithContext; removed the duplicate mustJSON helper.

Branch state after rebase (4 commits on top of upstream/dev):

  • fc105d31 fix(vault): accept and persist content on POST/PUT vault/documents
  • 5d5be0a6 fix(vault): harden symlink containment + mirror upload enrich-progress
  • 1fad4e0b fix(vault): non-master tenant 500, O_NOFOLLOW, slog, *string content
  • 9edcc00d fix(vault): surface close errors + clarify slog label + drop test dup

go build ./internal/http/, -tags sqliteonly, go vet, and the full internal/http test package all pass on the new base. Ready for CI.

@thieung thieung merged commit 0fcf5ba into nextlevelbuilder:dev May 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants