Skip to content

Commit b1a868c

Browse files
authored
Merge pull request #63 from AgentWorkforce/fix/mount-bulk-sync-migration
fix(mount): migrate to bulk sync + fix first-write 412
2 parents fb2fee5 + 35ffb6c commit b1a868c

8 files changed

Lines changed: 1987 additions & 153 deletions

File tree

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
# mount-bulk-sync migration boundary
2+
3+
Scope: migrate `relayfile-mount` (`internal/mountsync`) from per-file HTTP
4+
`PUT /fs/file` writes to batched `POST /fs/bulk` writes, and fix the
5+
first-write 412 `missing If-Match header` storm. Cross-ref: AgentWorkforce/cloud#342.
6+
7+
## 1. Files touched
8+
9+
| File | Create/Modify | Why |
10+
|------|---------------|-----|
11+
| `internal/mountsync/syncer.go` | modify | Add `WriteFilesBulk` method on `HTTPClient`; extend `RemoteClient` interface with it; introduce a batching accumulator that `pushLocal` populates and flushes at end of cycle; change `handleLocalWriteOrCreate` to drop the `If-Match` header on first-write (no tracked revision) path. |
12+
| `internal/mountsync/syncer_test.go` | modify | Add tests for bulk batching, first-write no-`If-Match` behavior, mixed create/update batch, and partial-error handling. Extend the existing `fakeRemoteClient` (or add one) to capture `WriteFilesBulk` calls. |
13+
| `internal/mountsync/types.go` *(or inline in `syncer.go` if no types file)* | modify | Add `BulkWriteFile` / `BulkWriteError` / `BulkWriteResponse` wire types mirroring `internal/relayfile/store.go:168-179` so `mountsync` does not import the server package. |
14+
| `docs/mount-bulk-migration-boundary.md` | create | This boundary doc. |
15+
| `docs/mount-bulk-migration-checklist.md` | *out of scope for this step* | Produced in the next phase. |
16+
17+
No `internal/httpapi/` or `internal/relayfile/` changes. The server-side
18+
`handleBulkWrite` (`internal/httpapi/server.go:1556`) and
19+
`relayfile.Store.BulkWrite` already accept the shape we need.
20+
21+
## 2. `HTTPClient.WriteFilesBulk` signature
22+
23+
Wire types (mirror of `internal/relayfile/store.go:168-179` plus the
24+
server response at `internal/httpapi/server.go:1629-1634`):
25+
26+
```go
27+
// BulkWriteFile matches relayfile.BulkWriteFile on the wire.
28+
type BulkWriteFile struct {
29+
Path string `json:"path"`
30+
ContentType string `json:"contentType"`
31+
Content string `json:"content"`
32+
Encoding string `json:"encoding,omitempty"`
33+
}
34+
35+
// BulkWriteError matches relayfile.BulkWriteError on the wire.
36+
type BulkWriteError struct {
37+
Path string `json:"path"`
38+
Code string `json:"code"`
39+
Message string `json:"message"`
40+
}
41+
42+
// BulkWriteResponse matches the JSON body returned by
43+
// POST /v1/workspaces/{id}/fs/bulk (status 202).
44+
type BulkWriteResponse struct {
45+
Written int `json:"written"`
46+
ErrorCount int `json:"errorCount"`
47+
Errors []BulkWriteError `json:"errors"`
48+
CorrelationID string `json:"correlationId"`
49+
}
50+
```
51+
52+
Method signature:
53+
54+
```go
55+
func (c *HTTPClient) WriteFilesBulk(
56+
ctx context.Context,
57+
workspaceID string,
58+
files []BulkWriteFile,
59+
) (BulkWriteResponse, error)
60+
```
61+
62+
Implementation notes:
63+
- Route: `POST /v1/workspaces/{workspaceID}/fs/bulk` (confirmed at
64+
`internal/httpapi/server.go:161-163`). No `forkId` query param — mountsync
65+
does not use forks.
66+
- Body: `{"files": [...]}`; `Content-Type: application/json`; reuse
67+
`c.doJSON` for retry/backoff parity with the existing single-file write.
68+
- Empty `files` is a caller bug — return early with a typed sentinel; do
69+
not round-trip a guaranteed 400.
70+
- Add the method to the `RemoteClient` interface at
71+
`internal/mountsync/syncer.go:96` so `fakeRemoteClient` in tests can
72+
implement it.
73+
74+
## 3. Syncer batching strategy
75+
76+
**Concurrency invariant (preserve as-is):** `pushLocal`
77+
(`internal/mountsync/syncer.go:1341`) already serializes writes within one
78+
sync cycle by iterating `localRemotePaths` in sorted order, one goroutine,
79+
no parallelism. The batching layer lives **between** the per-file handler
80+
and `HTTPClient`; it does not introduce concurrency.
81+
82+
**Batch, don't stream.** Replace the per-file `s.client.WriteFile(...)`
83+
call inside `pushSingleFile` (`internal/mountsync/syncer.go:599`) with an
84+
enqueue into an in-cycle accumulator owned by the `Syncer`:
85+
86+
```go
87+
type pendingBulkWrite struct {
88+
remotePath string
89+
localPath string
90+
snapshot localSnapshot
91+
tracked trackedFile
92+
exists bool
93+
contentType string
94+
content string
95+
}
96+
```
97+
98+
Rules:
99+
100+
1. **Full-cycle bulk** — the primary path. `pushLocal` accumulates every
101+
write that would have gone through `pushSingleFile` into a slice and
102+
flushes it with a single `WriteFilesBulk` call after the create/update
103+
loop completes but before the deletion loop (line 1420+). This is the
104+
common case (initial seed / mass local-edit storm) and collapses N HTTP
105+
requests into one.
106+
2. **Chunk cap** — if the accumulator reaches `bulkFlushThreshold` (start
107+
with 256 files) during the loop, flush eagerly and reset. Prevents
108+
unbounded request bodies on large `scanLocalFiles` results. Constant,
109+
not configurable in this migration.
110+
3. **Per-file stays** for the fsnotify fast path
111+
(`handleLocalWriteOrCreate`, `internal/mountsync/syncer.go:497,524`).
112+
Single-event writes should still batch-of-one through `WriteFilesBulk`
113+
(same endpoint, same code path — cheap) rather than re-enter the old
114+
`WriteFile` path. This keeps the `If-Match` header off the single-file
115+
PUT entirely and gives us one write code path to reason about.
116+
4. **Conflict / read-after-write reconciliation** stays per-file.
117+
`WriteFilesBulk` does not return a new revision (the server response
118+
has no per-file revision; see `handleBulkWrite` at
119+
`internal/httpapi/server.go:1629-1634`). After a successful bulk flush,
120+
we must `ReadFile` each written path to refresh `tracked.Revision` and
121+
`tracked.Hash` so subsequent single-file operations (delete, conflict
122+
recovery) still have the optimistic-concurrency token they need. This
123+
read-back happens sequentially inside the cycle; we pay one GET per
124+
write, but we save N−1 PUTs and dodge 412s.
125+
5. **Deletes are NOT batched** in this migration. `DeleteFile`
126+
(`internal/mountsync/syncer.go:191,734,1432`) still requires
127+
`If-Match`, and the server has no bulk-delete endpoint. Keep the
128+
existing per-file delete loop untouched.
129+
130+
## 4. First-write 412 fix
131+
132+
**Production symptom** (from the collect-context log):
133+
`http 412 precondition_failed: missing If-Match header` firing on first
134+
write of a new file, then `context deadline exceeded` cascading as
135+
retries back up.
136+
137+
**Root cause.** `pushSingleFile` at `internal/mountsync/syncer.go:592-599`
138+
sets `baseRevision := "0"` as a "create-if-absent" sentinel and passes it
139+
into `HTTPClient.WriteFile` which unconditionally sets
140+
`If-Match: "0"` at line 184. The server's single-file write endpoint
141+
rejects non-empty, non-matching `If-Match` values on a path that does not
142+
yet exist, surfacing as 412 `missing If-Match header`.
143+
144+
**Chosen fix: route all writes through `WriteFilesBulk` (Option A).**
145+
146+
The bulk endpoint (`handleBulkWrite`,
147+
`internal/httpapi/server.go:1556`) does **not** consult `If-Match` at all
148+
— it treats every entry as an upsert. Migrating all writes to bulk
149+
*structurally removes* the 412 storm: there is no `If-Match` header left
150+
on the write path for the server to reject.
151+
152+
The alternative (Option B: fetch current revision when `baseRevision`
153+
is empty/`"0"`) was rejected because:
154+
- it adds a speculative GET in front of every first write, doubling
155+
latency on the hot path;
156+
- it races with concurrent remote creates (the GET returns 404, we write
157+
with `If-Match: "0"`, and we are right back to the same 412);
158+
- it leaves two write code paths (bulk for batches, per-file for
159+
singletons) — more code, more drift risk.
160+
161+
Concretely, `pushSingleFile` no longer calls `s.client.WriteFile` for the
162+
create/update case. It enqueues into the bulk accumulator; the accumulator
163+
flushes via `WriteFilesBulk`; the read-back loop (strategy rule 4)
164+
refreshes revisions. `HTTPClient.WriteFile` stays on the struct for now
165+
(used by `revertReadonlyFile` and conflict recovery paths that already
166+
have a known revision), but its use shrinks to paths that always have a
167+
non-sentinel revision — so `If-Match: "0"` never hits the wire again.
168+
169+
## 5. New test cases
170+
171+
Added to `internal/mountsync/syncer_test.go`. Each test uses a fake
172+
`RemoteClient` that records calls; none hit the real httpapi server
173+
except where noted.
174+
175+
1. **`TestBulkWrite_SingleCallForNFiles`** — local dir has 5 new files;
176+
after `SyncOnce`, the fake client records exactly one `WriteFilesBulk`
177+
call containing all 5 paths and zero `WriteFile` calls.
178+
2. **`TestBulkWrite_FirstWriteNoIfMatch`** — against the real
179+
`newMountsyncAPIHandler` (same pattern as `TestBulkSeedThenSync`,
180+
`internal/mountsync/syncer_test.go:441`), seed nothing, write one new
181+
file locally, `SyncOnce`, assert the file lands on the server with
182+
HTTP 202 and no 412 is observed in the response log.
183+
3. **`TestBulkWrite_ChunkAtThreshold`** — local dir has 300 files with
184+
`bulkFlushThreshold` forced to 256 via a test hook; assert exactly two
185+
`WriteFilesBulk` calls with sizes 256 and 44.
186+
4. **`TestBulkWrite_PartialErrors`** — fake server returns a
187+
`BulkWriteResponse` with `errorCount=1` and one `forbidden` entry;
188+
assert the successful paths update `state.Files[remotePath].Revision`
189+
(via the read-back) while the failed path retains its prior tracked
190+
state and logs a denial, and the next `SyncOnce` does **not** re-push
191+
the denied file (matches the existing `WriteDenied`/`DeniedHash` skip
192+
at `internal/mountsync/syncer.go:1406`).
193+
5. **`TestBulkWrite_DeletesStayPerFile`** — local dir drops a previously
194+
tracked file; assert `WriteFilesBulk` is NOT called for the delete and
195+
one `DeleteFile` call is issued with the tracked `If-Match` revision.
196+
197+
## 6. Deliberately out of scope
198+
199+
- Any change under `internal/httpapi/` or `internal/relayfile/` — the
200+
server-side `POST /fs/bulk` and `Store.BulkWrite` are already correct
201+
for our needs.
202+
- Bulk delete. No server endpoint; per-file `DeleteFile` stays.
203+
- FUSE / kernel mount layer. This migration is HTTP-client-only.
204+
- Fork-aware bulk writes (`BulkWriteFork`). Mountsync does not target
205+
forks.
206+
- Websocket event path (`websocketURL`, `syncer.go:1706`). Untouched.
207+
- Retry/backoff tuning in `HTTPClient.doJSON`. Reuse as-is.
208+
- Content encoding / binary handling beyond what
209+
`BulkWriteFile.Encoding` already carries.
210+
211+
## 7. Acceptance gates
212+
213+
Before commit, all of the following must pass locally from the repo root.
214+
215+
```bash
216+
# 1. Package compiles and all mountsync tests pass, including new ones.
217+
go test ./internal/mountsync/... -count=1
218+
219+
# 2. Targeted run of the bulk-migration tests (must all pass, no skips).
220+
go test ./internal/mountsync/ -run 'TestBulkWrite_|TestBulkSeedThenSync' -count=1 -v
221+
222+
# 3. No residual per-file write call inside the push loop.
223+
# Expect: zero matches.
224+
grep -n 's.client.WriteFile(' internal/mountsync/syncer.go | grep -v revertReadonlyFile | grep -v conflict
225+
226+
# 4. WriteFilesBulk is wired into the interface and the HTTPClient.
227+
# Expect: at least one match in each of these files.
228+
grep -n 'WriteFilesBulk' internal/mountsync/syncer.go
229+
grep -n 'WriteFilesBulk' internal/mountsync/syncer_test.go
230+
231+
# 5. No If-Match: "0" sentinel left on the wire.
232+
# Expect: zero matches.
233+
grep -nE '"If-Match":\s*"0"' internal/mountsync/
234+
235+
# 6. Wider suite still green (regression check on httpapi + relayfile).
236+
go test ./internal/httpapi/... ./internal/relayfile/... -count=1
237+
```
238+
239+
Gate 3's filter is tolerant of the two legitimate remaining callers
240+
(`revertReadonlyFile`, conflict-recovery read-after-write); if either is
241+
renamed, update the grep alongside the rename in the same commit.
242+
243+
BOUNDARY_READY
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# mount-bulk-sync migration — reviewer verdict
2+
3+
Reviewed against `docs/mount-bulk-migration-boundary.md` on 2026-04-24.
4+
Diff was read fresh; any prior contents of this file have been
5+
discarded.
6+
7+
## 1. Scope / files touched
8+
9+
`git diff --name-only` → 5 modified paths, plus untracked files:
10+
11+
| File | Status | In boundary §1? |
12+
|------|--------|-----------------|
13+
| `internal/mountsync/syncer.go` | modified | ✅ yes |
14+
| `internal/mountsync/syncer_test.go` | modified | ✅ yes |
15+
| `internal/mountsync/types.go` | new (untracked) | ✅ yes |
16+
| `docs/mount-bulk-migration-boundary.md` | new (untracked) | ✅ yes (this is the doc) |
17+
| `internal/mountsync/http_client_test.go` | modified | ⚠️ not listed, but tightly scoped |
18+
| `.trajectories/…`, `package-lock.json`, `workflows/*.ts` | incidental | ignored (not code under review) |
19+
20+
`http_client_test.go` is not named in §1 but only adds two unit tests
21+
for the new `HTTPClient.WriteFilesBulk` method
22+
(`TestHTTPClientWriteFilesBulkUsesSinglePOSTWithAllFiles`,
23+
`TestHTTPClientWriteFilesBulkRejectsEmptyBatch`). That is a direct test
24+
of the deliverable in §2 and matches the pattern of neighbouring tests
25+
in the same file. Not scope creep.
26+
27+
Minor: `types.go` imports `internal/relayfile` to type-alias
28+
`BulkWriteFile` and `BulkWriteError`. The boundary said mountsync must
29+
not import the *server* (`internal/httpapi`) package; it does not
30+
forbid `internal/relayfile`, which is the wire-type source of truth.
31+
Acceptable.
32+
33+
## 2. Concurrency safety
34+
35+
`pushLocal` (`internal/mountsync/syncer.go:1488`) serializes writes
36+
exactly as before: single goroutine, sorted `localRemotePaths`, no
37+
parallel fan-out. The new accumulator `pendingWrites` is a **local
38+
slice** inside `pushLocal`, not a field on `Syncer`, so there is no
39+
shared mutable state across goroutines and nothing new that the
40+
existing `Syncer.mu` needs to guard. The cycle-level serialization
41+
invariant from the boundary doc is preserved.
42+
43+
## 3. 412 first-write fix — no pre-write GET
44+
45+
`preparePendingBulkWrite` no longer constructs a `baseRevision := "0"`
46+
sentinel and no longer calls `s.client.WriteFile` — it enqueues into
47+
the bulk accumulator. The bulk endpoint does not consult `If-Match`,
48+
so no `If-Match: "0"` ever hits the wire (gate 5 greps confirm zero
49+
matches). No speculative GET is issued before writes. The post-flush
50+
`ReadFile` in `reconcileBulkWrite` runs **after** a successful bulk
51+
write, only when the response did not carry a revision — this matches
52+
boundary strategy rule 4 (“one GET per write, sequential, inside the
53+
cycle; we save N−1 PUTs and dodge 412s”). Bounded by the number of
54+
files just written, not unbounded, and never precedes the write.
55+
56+
## 4. Per-file error handling
57+
58+
`flushPendingBulkWrites` (`syncer.go:~660`) walks both the success
59+
paths and the `errors` slice and dispatches each entry independently:
60+
61+
- `forbidden` / `conflict` / per-file codes are translated via
62+
`bulkWriteErrorAsError` into the same `*HTTPError` / `*ConflictError`
63+
shapes the single-file path already produced, then handled by
64+
`handleWriteError` — which preserves the prior behaviour
65+
(`WriteDenied`/`DeniedHash` record for create-forbidden, conflict
66+
reconciliation with a `ReadFile`).
67+
- An unrecognized code captured into `firstErr` **does not abort the
68+
batch**: the loop keeps reconciling the remaining entries and only
69+
returns `firstErr` at the end.
70+
71+
`TestBulkWrite_PartialErrors` and
72+
`TestBulkWrite_PerFileErrorMarksDirtyForRetry` exercise this and pass.
73+
Criterion satisfied.
74+
75+
## 5. Post-flush revision read-back
76+
77+
`reconcileBulkWrite` (`syncer.go:~720`) updates
78+
`s.state.Files[path].Revision` from, in order:
79+
80+
1. `response.revisionsByPath()[path]` — revisions surfaced by the
81+
server in the bulk response (`Results[].Revision` or
82+
`Files[].Revision`).
83+
2. If the server omitted a revision, a post-write `s.client.ReadFile`
84+
on that path.
85+
86+
It **never** reuses the pre-write `tracked.Revision` as the stored
87+
revision, which is exactly what the boundary mandated. Confirmed by
88+
reading the function end-to-end: the `trackedFile` literal written to
89+
`s.state.Files` uses the local `revision` variable (populated from the
90+
response or the read-back), `pendingWrite.snapshot.Hash` for the
91+
content hash, and `Dirty: false`.
92+
93+
Today the real server does not return per-file revisions
94+
(`internal/httpapi/server.go:1607-1612` — only
95+
`{written, errorCount, errors, correlationId}`), so production will
96+
exercise the `ReadFile` branch for every successful write. That is
97+
the expected path per strategy rule 4. The `Results`/`Files`
98+
revision-harvest branch is dead on the current server but
99+
future-proof, and `TestBulkWrite_ReconcileUsesResponseRevision` pins
100+
it down.
101+
102+
Nit (non-blocking): the contentType fallback in `reconcileBulkWrite`
103+
104+
```go
105+
contentType := strings.TrimSpace(pendingWrite.snapshot.ContentType)
106+
if contentType == "" {
107+
contentType = pendingWrite.snapshot.ContentType
108+
}
109+
```
110+
111+
is a no-op — the fallback reassigns the same value that just trimmed
112+
to empty. Harmless, but cosmetic dead code.
113+
114+
## 6. Build / vet / fmt / test gates
115+
116+
All acceptance gates pass locally:
117+
118+
- `go build ./internal/mountsync/...` — clean.
119+
- `go vet ./internal/mountsync/...` — clean.
120+
- `gofmt -l internal/mountsync/` — clean.
121+
- `go test ./internal/mountsync/... -count=1``ok 2.280s`.
122+
- `go test ./internal/mountsync/ -run 'TestBulkWrite_|TestBulkSeedThenSync' -count=1 -v`
123+
— all 10 `TestBulkWrite_*` + `TestBulkSeedThenSync` pass, including
124+
`_SingleCallForNFiles`, `_FirstWriteNoIfMatch`, `_ChunkAtThreshold`,
125+
`_PartialErrors`, `_DeletesStayPerFile`.
126+
- `go test ./internal/httpapi/... ./internal/relayfile/...` — clean
127+
(regression gate).
128+
- `grep 's.client.WriteFile(' internal/mountsync/syncer.go`**zero
129+
matches** inside the push loop (gate 3).
130+
- `grep 'If-Match.*0'` inside `internal/mountsync/` → zero matches
131+
(gate 5).
132+
133+
## Summary
134+
135+
The implementation matches the boundary doc on every criterion this
136+
review had to verify: scoped file list, preserved cycle-level
137+
serialization, no `If-Match: "0"` on the wire, no pre-write GET,
138+
per-file error surfacing without batch abortion, and post-flush
139+
revision refresh sourced from the response (with a bounded
140+
`ReadFile` fallback).
141+
142+
REVIEW_APPROVED

0 commit comments

Comments
 (0)