fix(vault): accept and persist content on POST/PUT vault/documents#1174
Conversation
|
@clark-cant Review this PR |
PR ReviewVerdict: REQUEST CHANGES SummaryThis fixes a real gap: JSON Risk level: medium-high because this introduces direct filesystem writes from JSON input. Findings
The current check only verifies the lexical absolute path: target := filepath.Join(workspace, filepath.Clean(relPath))
absTarget, _ := filepath.Abs(target)
absWS, _ := filepath.Abs(workspace)
strings.HasPrefix(absTarget, absWS + string(os.PathSeparator))This blocks The PR body says symlink targets outside the workspace are rejected, but the implementation does not enforce that yet. Please resolve symlinks for the parent/target path before writing, or use a no-symlink-safe file creation strategy. Add a regression test for a symlink inside the workspace pointing outside.
If enrichment progress/UI depends on this, JSON
Please add tests for at least:
Anti-AI-Slop Check
Tests / CI
Final NotesGood fix direction, but filesystem writes need to be airtight before merge. The symlink escape issue is blocking. |
Addresses review on nextlevelbuilder#1174: - writeDocumentContent now resolves symlinks before writing. A symlink living inside the workspace but pointing outside (e.g. workspace/link -> /etc) previously passed the lexical prefix check and os.WriteFile followed it out of bounds. We now EvalSymlinks the workspace root and the deepest existing ancestor of the target, re-check containment before any mkdir/write, and refuse a symlink planted at the final component via Lstat. - publishDocUpserted starts enrichment progress (enrichProgress.Start) before publishing EventVaultDocUpserted, matching handleUpload's ordering so the worker pool can't drain the event before the progress tracker registers it. - Add regression tests: symlink dir escape, final-component symlink, parent escape, empty content, disallowed extension, event-after-upsert ordering, metadata-only no-write, and upsert-failure orphan behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @clark-cant — addressed all three in 9349da8. [HIGH] Symlink escape —
Regression tests cover the symlinked-directory escape and the final-component-symlink case (plus [MEDIUM] Progress vs. publish race — [LOW] Tests — new Builds clean on both |
thieung
left a comment
There was a problem hiding this comment.
Thanks for catching this — the dropped content field is real and the fix matches the upload path closely. A few blockers and hardening gaps before this is ready to merge; inline notes below.
Blocking
B1 — non-master tenants get HTTP 500 on first content write. resolveTenantWorkspace returns workspace/tenants/{slug}/ for any non-master tenant (internal/config/tenant_paths.go:38–53), and that directory is only created on demand by handleUpload/handleRescan. filepath.EvalSymlinks on a missing path returns os.ErrNotExist, so the very first JSON POST/PUT with content for a fresh tenant 500s. All tests use the master tenant, where TenantWorkspace returns workspace unchanged, so the regression isn't covered. See inline at line 28.
Important
- I1 TOCTOU between
LstatandWriteFile— PR is sold as symlink hardening, but the final write still has noO_NOFOLLOW. Inline at line 72. - I3 Rejected escape attempts don't emit a
security.*slog event — peer surface (internal/http/storage.go:158, :175) does. Inline at line 36. - I4 POST/PUT
contentsemantics are asymmetric (stringvs*string,""means "absent" on POST but "clear file + fire event" on PUT). Inline at line 219.
Test gaps
- No non-master tenant test (the B1 hotspot).
- No test pinning PUT
Content = nil→ no write, no event. - No test for handler-level
..rejection at:232.
The core change (hash + enrichment ordering + lexical containment) is correct and well-tested for the master-tenant path — B1 is mainly about closing the multi-tenant gap before merge.
| func (h *VaultHandler) writeDocumentContent(workspace, relPath string, content []byte) (string, error) { | ||
| // Resolve the workspace root through any symlinks so all containment checks | ||
| // compare canonical paths. | ||
| realWS, err := filepath.EvalSymlinks(workspace) |
There was a problem hiding this comment.
Blocking (B1). EvalSymlinks returns os.ErrNotExist for a path that doesn't exist yet, and for non-master tenants workspace here is workspace/tenants/{slug}/ which is only created on demand by handleUpload (vault_handler_upload.go:109 does os.MkdirAll(...) before writing). On a fresh non-master tenant whose first vault interaction is this endpoint, the call 500s with failed to write content.
Suggested fix: either os.MkdirAll(workspace, 0o755) before EvalSymlinks, or fall through to lexical containment on errors.Is(err, os.ErrNotExist). Either way, please add a regression test with a non-master tenant UUID — current tests all use masterTenant where TenantWorkspace returns workspace unchanged, so this path isn't exercised.
| return "", os.ErrInvalid | ||
| } | ||
|
|
||
| if err := os.WriteFile(target, content, 0o644); err != nil { |
There was a problem hiding this comment.
Important (I1). The Lstat check above followed by os.WriteFile is a TOCTOU window: os.WriteFile opens with O_WRONLY|O_CREATE|O_TRUNC (no O_NOFOLLOW), so a tenant who can race the request can swap a regular file for a symlink between the two calls. Same race exists in handleUpload (vault_handler_upload.go:164 uses os.Create), so it isn't a new bug — but since this PR explicitly bills itself as symlink-containment hardening, it's worth closing here.
Suggestion: replace this os.WriteFile with os.OpenFile(dst, os.O_WRONLY|os.O_CREATE|os.O_TRUNC|syscall.O_NOFOLLOW, 0o644) then write + close, or reuse the evalSymlinkOrClean + pathWithinDir helpers in internal/http/storage.go:131–148 which already handle this surface.
|
|
||
| // Lexical containment: blocks "../" and absolute-path escapes. | ||
| if target != realWS && !strings.HasPrefix(target, realWS+string(os.PathSeparator)) { | ||
| return "", os.ErrInvalid |
There was a problem hiding this comment.
Important (I3). This (and lines 49, 69) reject symlink/path-escape attempts by returning os.ErrInvalid, but the only resulting log line is the generic vault.create: write content failed at line 299 — oncall can't alert on attempted escapes. Compare with internal/http/storage.go:158 (slog.Warn("security.storage_symlink_escape", "resolved", ..., "base", ...)) and :175 — the codebase pattern is a security.* slog at the rejection site itself. Suggest adding slog.Warn("security.vault_symlink_escape", ...) with the resolved path + workspace at each of the three rejection points before returning.
| var body struct { | ||
| Path string `json:"path"` | ||
| Title string `json:"title"` | ||
| Content string `json:"content"` // optional; when set, written to disk |
There was a problem hiding this comment.
Important (I4). POST uses Content string while PUT (line 351) uses Content *string. Result: on POST, content: "" and "content not provided" are indistinguishable (both skip the write); on PUT, content: "" is a destructive action — it writes a 0-byte file, sets content_hash = sha256(""), and fires EventVaultDocUpserted. A caller who copy-pastes a payload between the two endpoints will be surprised.
Suggest either: (a) make POST also *string and treat nil vs "" consistently with PUT, or (b) reject content: "" on PUT with a 400 (and use DELETE for clearing) so the API only writes when bytes were actually sent. Whichever you pick, please document it in the handler godoc — the comment at line 351 documents PUT's behavior but not the asymmetry.
|
|
||
| // masterTenant resolves TenantWorkspace to h.workspace unchanged, keeping the | ||
| // content-write tests pointed at a single temp dir. | ||
| var masterTenant = uuid.MustParse("0193a5b0-7000-7000-8000-000000000001") |
There was a problem hiding this comment.
Test coverage gap. Every handler-level test uses masterTenant, which is the one case where config.TenantWorkspace returns workspace unchanged (see internal/config/tenant_paths.go:39). That hides the B1 regression entirely. Please add at least one test with a non-master tenant UUID + tenant slug context (via store.WithTenantID + store.WithTenantSlug) covering the first content write into a fresh workspace/tenants/{slug}/ dir.
Also missing:
- A test pinning PUT with
Content = nil→ no write, no event (locks the "no change" contract). - A handler-level test for path containing
..exercising the rejection atvault_handler_documents.go:232(the innerwriteDocumentContentparent-escape test doesn't cover the handler short-circuit).
|
Backlog review: high-value but not merge-ready yet. The bug is real: content in vault document POST/PUT was accepted but not persisted, and this PR adds path/symlink containment plus enrichment event coverage. Current blockers: PR is BLOCKED/no checks reported and targets main, so retarget/rebase onto dev and run CI first. One behavior to confirm while rebasing: create currently treats content:"" as omitted even though update supports empty content and writeDocumentContent has an empty-file test. |
Addresses second review on nextlevelbuilder#1174 (thieung): **B1 (blocking)** — first JSON content write into a fresh non-master tenant returned 500 because workspace/tenants/{slug}/ doesn't exist yet and EvalSymlinks fails with ErrNotExist. writeDocumentContent now MkdirAlls the workspace before resolving (mirrors handleUpload). Added a regression test that drives a non-master tenant context end-to-end and asserts the file lands under workspace/tenants/{slug}/. **I1** — replaced the Lstat→WriteFile sequence (TOCTOU window) with os.OpenFile(O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW). The Lstat short-fail stays as defense-in-depth + cleaner error logging. Portability handled via build-tagged constants in vault_handler_documents_nofollow_{unix, windows}.go — O_NOFOLLOW is no-op on Windows (desktop edition is single-user with no untrusted tenants). **I3** — every symlink/path-escape rejection now emits slog.Warn("security.vault_symlink_escape", ...) with the rejection site, resolved path and workspace, matching the storage.go pattern so oncall can alert on attempted escapes. **I4** — POST `content` field changed from `string` to `*string` to match PUT. Nil pointer = "no write" (metadata-only stub); empty string = "write a 0-byte file + fire event"; non-empty = write bytes. Updated handler godoc to spell out the symmetry. Switched `werr == os.ErrInvalid` to `errors.Is` per repo conventions. **Tests added** — non-master tenant first content write (B1), PUT with nil content (locks "no change" contract), handler-level ".." path rejection, POST content:"" writes empty file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @thieung, all findings addressed in ba59e7b. B1 (blocking) — I1 (TOCTOU) — replaced
If the open returns ELOOP, the request is rejected with I considered reusing I3 — every rejection site now emits I4 — POST
Also switched New tests (4):
Full @mrgoonie — re: retargeting onto |
All four blocking/important findings addressed in ba59e7b with full test coverage. Re-reviewing as approve.
thieung
left a comment
There was a problem hiding this comment.
Approving — addresses all four findings from the prior review with full test coverage:
- B1 — before
EvalSymlinks(vault_handler_documents.go:33);TestHandleCreateDocument_NonMasterTenantFirstContentWriteexercises the slug-resolved path end-to-end. - I1 —
os.WriteFilereplaced withO_WRONLY|O_CREATE|O_TRUNC|oNoFollowopen via build-taggedoNoFollowconstants (unix=syscall.O_NOFOLLOW, windows=0 with documented degradation).ELOOPmapped toos.ErrInvalidfor clean 400. - I3 — Four
slog.Warn("security.vault_symlink_escape", "site", ...)lines covering all rejection sites (lexical / ancestor / final_lstat / open_nofollow); verified live in test logs. - I4 — POST
Contentis now*string, symmetric with PUT; godoc enumerates the three states (nil /""/ non-empty);errors.Isswap done.
All tests pass on PG and sqliteonly builds; vet clean.
Three trivial non-blocking nits if you ever revisit:
defer f.Close()atvault_handler_documents.go:91drops close errors — wouldn't affect security, but a flushed-disk-full would still return success. Trivial fix.site=lexicalslog label is technically a path traversal, not a symlink — cosmetic; the alert key still fits.mustJSONtest helper duplicatesjson.MarshalinnewJSONRequest— minor DRY.
Nice work on the build-tagged Windows fallback and the explicit godoc on the empty-string contract.
thieung
left a comment
There was a problem hiding this comment.
Approving — ba59e7b0 addresses all four findings from the prior review with full test coverage:
- B1 —
os.MkdirAll(workspace, 0o755)beforeEvalSymlinks(vault_handler_documents.go:33);TestHandleCreateDocument_NonMasterTenantFirstContentWriteexercises the slug-resolved path end-to-end. - I1 —
os.WriteFilereplaced withO_WRONLY|O_CREATE|O_TRUNC|oNoFollowopen via build-taggedoNoFollowconstants (unix=syscall.O_NOFOLLOW, windows=0 with documented degradation).ELOOPmapped toos.ErrInvalidfor clean 400. - I3 — Four
slog.Warn("security.vault_symlink_escape", "site", ...)lines covering all rejection sites (lexical / ancestor / final_lstat / open_nofollow); verified live in test logs. - I4 — POST
Contentis now*string, symmetric with PUT; godoc enumerates the three states (nil /""/ non-empty);errors.Isswap done.
All tests pass on PG and sqliteonly builds; vet clean.
Three trivial non-blocking nits if you ever revisit:
defer f.Close()atvault_handler_documents.go:91drops close errors — wouldn't affect security, but a flushed-disk-full would still return success. Trivial fix.site=lexicalslog label is technically a path traversal, not a symlink — cosmetic; the alert key still fits.mustJSONtest helper duplicatesjson.MarshalinnewJSONRequest— minor DRY.
Nice work on the build-tagged Windows fallback and the explicit godoc on the empty-string contract.
Duplicate — shell-escape mishap. See review 4381446173 for the intended approve body.
Yes pls |
Previously POST /v1/{agents/{id}/}vault/documents and the matching PUT
endpoint declared a body struct without a `content` field, so callers
passing JSON `{"path": ..., "title": ..., "content": "..."}` got back
HTTP 201 with a doc id, but the bytes were silently discarded:
- bindJSON ignored the `content` key (no matching field)
- the document row was inserted with content_hash="" and no file on disk
- the enrichment pipeline (summary, embeddings, links) was never
triggered because no EventVaultDocUpserted was published
The endpoint was effectively a metadata-only create — there was no client
path that combined "register doc + supply bytes" via JSON. /v1/vault/upload
exists, but it requires multipart, flattens paths to basename, and is
inconvenient for programmatic clients that already have UTF-8 text in
memory (CI jobs, agent skills, MCP tools).
This change makes `content` an optional field on both handlers:
- POST: when content is non-empty, validate the file extension against
the same allowedUploadExts whitelist used by /v1/vault/upload, write
bytes to <tenant-workspace>/<path>, compute SHA-256, store it as
content_hash on the doc, and publish EventVaultDocUpserted so the
enrichment worker picks it up — exactly the same downstream
behaviour as /v1/vault/upload.
- PUT: same logic; nil content = no change, non-nil content = rewrite
the file on disk (and an empty string truncates it).
- Omitting `content` preserves the historical metadata-only behaviour
(e.g. for callers that materialise files via filesystem rescan).
- Path validation reuses the existing rules: no `..`, no absolute
paths; an extra abs-path check rejects symlink escapes from the
workspace root.
Two small helpers (writeDocumentContent, publishDocUpserted) factor out
the disk write + event publish so both handlers share the same code,
and so they can be unit-tested in isolation later.
The fix is backwards-compatible: existing callers that don't send
`content` continue to work unchanged. Existing docs created without
content remain metadata-only stubs; a subsequent PUT with `content`
will materialise them.
Verified live on a development instance:
POST → row inserted, file at <workspace>/<path>, SHA-256 matches,
enrichment event published.
PUT → file rewritten, content_hash updated, updated_at refreshed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review on nextlevelbuilder#1174: - writeDocumentContent now resolves symlinks before writing. A symlink living inside the workspace but pointing outside (e.g. workspace/link -> /etc) previously passed the lexical prefix check and os.WriteFile followed it out of bounds. We now EvalSymlinks the workspace root and the deepest existing ancestor of the target, re-check containment before any mkdir/write, and refuse a symlink planted at the final component via Lstat. - publishDocUpserted starts enrichment progress (enrichProgress.Start) before publishing EventVaultDocUpserted, matching handleUpload's ordering so the worker pool can't drain the event before the progress tracker registers it. - Add regression tests: symlink dir escape, final-component symlink, parent escape, empty content, disallowed extension, event-after-upsert ordering, metadata-only no-write, and upsert-failure orphan behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses second review on nextlevelbuilder#1174 (thieung): **B1 (blocking)** — first JSON content write into a fresh non-master tenant returned 500 because workspace/tenants/{slug}/ doesn't exist yet and EvalSymlinks fails with ErrNotExist. writeDocumentContent now MkdirAlls the workspace before resolving (mirrors handleUpload). Added a regression test that drives a non-master tenant context end-to-end and asserts the file lands under workspace/tenants/{slug}/. **I1** — replaced the Lstat→WriteFile sequence (TOCTOU window) with os.OpenFile(O_WRONLY|O_CREATE|O_TRUNC|O_NOFOLLOW). The Lstat short-fail stays as defense-in-depth + cleaner error logging. Portability handled via build-tagged constants in vault_handler_documents_nofollow_{unix, windows}.go — O_NOFOLLOW is no-op on Windows (desktop edition is single-user with no untrusted tenants). **I3** — every symlink/path-escape rejection now emits slog.Warn("security.vault_symlink_escape", ...) with the rejection site, resolved path and workspace, matching the storage.go pattern so oncall can alert on attempted escapes. **I4** — POST `content` field changed from `string` to `*string` to match PUT. Nil pointer = "no write" (metadata-only stub); empty string = "write a 0-byte file + fire event"; non-empty = write bytes. Updated handler godoc to spell out the symmetry. Switched `werr == os.ErrInvalid` to `errors.Is` per repo conventions. **Tests added** — non-master tenant first content write (B1), PUT with nil content (locks "no change" contract), handler-level ".." path rejection, POST content:"" writes empty file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trivial nits from thieung's review of ba59e7b: - writeDocumentContent now closes the file explicitly and propagates any Close() error so a delayed FS failure (e.g. disk-full at flush) doesn't return success. - Renamed slog `site=lexical` to `site=path_traversal` — the lexical prefix check catches "../" / absolute-path escapes, not symlinks. Umbrella key `security.vault_symlink_escape` stays. - Non-master-tenant test now reuses newJSONRequest + WithContext, dropping the duplicate mustJSON helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ba59e7b to
9edcc00
Compare
|
Done — rebased onto Folded the three nits in as
Branch state after rebase (4 commits on top of
|
Summary
POST /v1/agents/{id}/vault/documents(and its tenant-wide twin) declared a body struct without acontentfield, so JSON payloads carryingcontentgot HTTP 201 with a doc id but the bytes were silently discarded —content_hashstayed empty, no file was written, and the enrichment pipeline never fired.This makes
contentan optional field on bothPOSTandPUT. When present, the bytes land on disk under<tenant-workspace>/<path>, a SHA-256 hash is stored on the row, and anEventVaultDocUpsertedis published — the same downstream behaviour asPOST /v1/vault/upload, but available from JSON callers that don't want to bother with multipart.Omitting
contentpreserves the historical metadata-only behaviour, so existing callers (e.g. those that materialise files via filesystem rescan) keep working.Why
Programmatic clients that already have UTF-8 text in memory — CI jobs, IDE agent skills, MCP tools — have no clean way to populate the vault today:
POST /v1/{agents/{id}}/vault/documentsaccepts JSON but drops the content.POST /v1/vault/uploadaccepts multipart but flattens paths to basename and is awkward to build from a non-form HTTP client.I hit this writing a per-repo "ingest" tool for Claude Code skills against a self-hosted goclaw — markdown files were registered but every doc was an empty shell.
Changes
internal/http/vault_handler_documents.gohandleCreateDocument: addContent stringto the body struct; when set, validate extension against the existingallowedUploadExtswhitelist, write to disk inside the tenant workspace (with abs-path escape check), setContentHash, and publishEventVaultDocUpsertedafter the DB upsert so the enrichment worker can locate the row.handleUpdateDocument: addContent *string(nil = no change). Same on-disk write/hash/event flow when set.writeDocumentContent,publishDocUpserted) so both handlers share the implementation.No store-layer, schema, or event-bus changes. No new dependencies.
Backwards compatibility
100% — callers that don't send
contentget the exact prior behaviour. Existing metadata-only rows on disk remain valid; a subsequentPUTwithcontentmaterialises them.Test plan
go build ./...cleango test ./internal/http/passesPOSTwithcontent→ DB row inserted, file at<workspace>/<path>, SHA-256 matches request body,content_hashpopulated.PUTwithcontent→ file rewritten,content_hashupdated.POSTwithoutcontent→ unchanged behaviour: emptycontent_hash, no file on disk.../.., absolute, symlink target outside workspace) → rejected with HTTP 400.🤖 Generated with Claude Code