|
| 1 | +# mount-bulk-sync migration boundary |
| 2 | + |
| 3 | +Scope: migrate `relayfile-mount` (`internal/mountsync`) from per-file HTTP |
| 4 | +`PUT /fs/file` writes to batched `POST /fs/bulk` writes, and fix the |
| 5 | +first-write 412 `missing If-Match header` storm. Cross-ref: AgentWorkforce/cloud#342. |
| 6 | + |
| 7 | +## 1. Files touched |
| 8 | + |
| 9 | +| File | Create/Modify | Why | |
| 10 | +|------|---------------|-----| |
| 11 | +| `internal/mountsync/syncer.go` | modify | Add `WriteFilesBulk` method on `HTTPClient`; extend `RemoteClient` interface with it; introduce a batching accumulator that `pushLocal` populates and flushes at end of cycle; change `handleLocalWriteOrCreate` to drop the `If-Match` header on first-write (no tracked revision) path. | |
| 12 | +| `internal/mountsync/syncer_test.go` | modify | Add tests for bulk batching, first-write no-`If-Match` behavior, mixed create/update batch, and partial-error handling. Extend the existing `fakeRemoteClient` (or add one) to capture `WriteFilesBulk` calls. | |
| 13 | +| `internal/mountsync/types.go` *(or inline in `syncer.go` if no types file)* | modify | Add `BulkWriteFile` / `BulkWriteError` / `BulkWriteResponse` wire types mirroring `internal/relayfile/store.go:168-179` so `mountsync` does not import the server package. | |
| 14 | +| `docs/mount-bulk-migration-boundary.md` | create | This boundary doc. | |
| 15 | +| `docs/mount-bulk-migration-checklist.md` | *out of scope for this step* | Produced in the next phase. | |
| 16 | + |
| 17 | +No `internal/httpapi/` or `internal/relayfile/` changes. The server-side |
| 18 | +`handleBulkWrite` (`internal/httpapi/server.go:1556`) and |
| 19 | +`relayfile.Store.BulkWrite` already accept the shape we need. |
| 20 | + |
| 21 | +## 2. `HTTPClient.WriteFilesBulk` signature |
| 22 | + |
| 23 | +Wire types (mirror of `internal/relayfile/store.go:168-179` plus the |
| 24 | +server response at `internal/httpapi/server.go:1629-1634`): |
| 25 | + |
| 26 | +```go |
| 27 | +// BulkWriteFile matches relayfile.BulkWriteFile on the wire. |
| 28 | +type BulkWriteFile struct { |
| 29 | + Path string `json:"path"` |
| 30 | + ContentType string `json:"contentType"` |
| 31 | + Content string `json:"content"` |
| 32 | + Encoding string `json:"encoding,omitempty"` |
| 33 | +} |
| 34 | + |
| 35 | +// BulkWriteError matches relayfile.BulkWriteError on the wire. |
| 36 | +type BulkWriteError struct { |
| 37 | + Path string `json:"path"` |
| 38 | + Code string `json:"code"` |
| 39 | + Message string `json:"message"` |
| 40 | +} |
| 41 | + |
| 42 | +// BulkWriteResponse matches the JSON body returned by |
| 43 | +// POST /v1/workspaces/{id}/fs/bulk (status 202). |
| 44 | +type BulkWriteResponse struct { |
| 45 | + Written int `json:"written"` |
| 46 | + ErrorCount int `json:"errorCount"` |
| 47 | + Errors []BulkWriteError `json:"errors"` |
| 48 | + CorrelationID string `json:"correlationId"` |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +Method signature: |
| 53 | + |
| 54 | +```go |
| 55 | +func (c *HTTPClient) WriteFilesBulk( |
| 56 | + ctx context.Context, |
| 57 | + workspaceID string, |
| 58 | + files []BulkWriteFile, |
| 59 | +) (BulkWriteResponse, error) |
| 60 | +``` |
| 61 | + |
| 62 | +Implementation notes: |
| 63 | +- Route: `POST /v1/workspaces/{workspaceID}/fs/bulk` (confirmed at |
| 64 | + `internal/httpapi/server.go:161-163`). No `forkId` query param — mountsync |
| 65 | + does not use forks. |
| 66 | +- Body: `{"files": [...]}`; `Content-Type: application/json`; reuse |
| 67 | + `c.doJSON` for retry/backoff parity with the existing single-file write. |
| 68 | +- Empty `files` is a caller bug — return early with a typed sentinel; do |
| 69 | + not round-trip a guaranteed 400. |
| 70 | +- Add the method to the `RemoteClient` interface at |
| 71 | + `internal/mountsync/syncer.go:96` so `fakeRemoteClient` in tests can |
| 72 | + implement it. |
| 73 | +
|
| 74 | +## 3. Syncer batching strategy |
| 75 | +
|
| 76 | +**Concurrency invariant (preserve as-is):** `pushLocal` |
| 77 | +(`internal/mountsync/syncer.go:1341`) already serializes writes within one |
| 78 | +sync cycle by iterating `localRemotePaths` in sorted order, one goroutine, |
| 79 | +no parallelism. The batching layer lives **between** the per-file handler |
| 80 | +and `HTTPClient`; it does not introduce concurrency. |
| 81 | +
|
| 82 | +**Batch, don't stream.** Replace the per-file `s.client.WriteFile(...)` |
| 83 | +call inside `pushSingleFile` (`internal/mountsync/syncer.go:599`) with an |
| 84 | +enqueue into an in-cycle accumulator owned by the `Syncer`: |
| 85 | +
|
| 86 | +```go |
| 87 | +type pendingBulkWrite struct { |
| 88 | + remotePath string |
| 89 | + localPath string |
| 90 | + snapshot localSnapshot |
| 91 | + tracked trackedFile |
| 92 | + exists bool |
| 93 | + contentType string |
| 94 | + content string |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +Rules: |
| 99 | + |
| 100 | +1. **Full-cycle bulk** — the primary path. `pushLocal` accumulates every |
| 101 | + write that would have gone through `pushSingleFile` into a slice and |
| 102 | + flushes it with a single `WriteFilesBulk` call after the create/update |
| 103 | + loop completes but before the deletion loop (line 1420+). This is the |
| 104 | + common case (initial seed / mass local-edit storm) and collapses N HTTP |
| 105 | + requests into one. |
| 106 | +2. **Chunk cap** — if the accumulator reaches `bulkFlushThreshold` (start |
| 107 | + with 256 files) during the loop, flush eagerly and reset. Prevents |
| 108 | + unbounded request bodies on large `scanLocalFiles` results. Constant, |
| 109 | + not configurable in this migration. |
| 110 | +3. **Per-file stays** for the fsnotify fast path |
| 111 | + (`handleLocalWriteOrCreate`, `internal/mountsync/syncer.go:497,524`). |
| 112 | + Single-event writes should still batch-of-one through `WriteFilesBulk` |
| 113 | + (same endpoint, same code path — cheap) rather than re-enter the old |
| 114 | + `WriteFile` path. This keeps the `If-Match` header off the single-file |
| 115 | + PUT entirely and gives us one write code path to reason about. |
| 116 | +4. **Conflict / read-after-write reconciliation** stays per-file. |
| 117 | + `WriteFilesBulk` does not return a new revision (the server response |
| 118 | + has no per-file revision; see `handleBulkWrite` at |
| 119 | + `internal/httpapi/server.go:1629-1634`). After a successful bulk flush, |
| 120 | + we must `ReadFile` each written path to refresh `tracked.Revision` and |
| 121 | + `tracked.Hash` so subsequent single-file operations (delete, conflict |
| 122 | + recovery) still have the optimistic-concurrency token they need. This |
| 123 | + read-back happens sequentially inside the cycle; we pay one GET per |
| 124 | + write, but we save N−1 PUTs and dodge 412s. |
| 125 | +5. **Deletes are NOT batched** in this migration. `DeleteFile` |
| 126 | + (`internal/mountsync/syncer.go:191,734,1432`) still requires |
| 127 | + `If-Match`, and the server has no bulk-delete endpoint. Keep the |
| 128 | + existing per-file delete loop untouched. |
| 129 | + |
| 130 | +## 4. First-write 412 fix |
| 131 | + |
| 132 | +**Production symptom** (from the collect-context log): |
| 133 | +`http 412 precondition_failed: missing If-Match header` firing on first |
| 134 | +write of a new file, then `context deadline exceeded` cascading as |
| 135 | +retries back up. |
| 136 | + |
| 137 | +**Root cause.** `pushSingleFile` at `internal/mountsync/syncer.go:592-599` |
| 138 | +sets `baseRevision := "0"` as a "create-if-absent" sentinel and passes it |
| 139 | +into `HTTPClient.WriteFile` which unconditionally sets |
| 140 | +`If-Match: "0"` at line 184. The server's single-file write endpoint |
| 141 | +rejects non-empty, non-matching `If-Match` values on a path that does not |
| 142 | +yet exist, surfacing as 412 `missing If-Match header`. |
| 143 | + |
| 144 | +**Chosen fix: route all writes through `WriteFilesBulk` (Option A).** |
| 145 | + |
| 146 | +The bulk endpoint (`handleBulkWrite`, |
| 147 | +`internal/httpapi/server.go:1556`) does **not** consult `If-Match` at all |
| 148 | +— it treats every entry as an upsert. Migrating all writes to bulk |
| 149 | +*structurally removes* the 412 storm: there is no `If-Match` header left |
| 150 | +on the write path for the server to reject. |
| 151 | + |
| 152 | +The alternative (Option B: fetch current revision when `baseRevision` |
| 153 | +is empty/`"0"`) was rejected because: |
| 154 | +- it adds a speculative GET in front of every first write, doubling |
| 155 | + latency on the hot path; |
| 156 | +- it races with concurrent remote creates (the GET returns 404, we write |
| 157 | + with `If-Match: "0"`, and we are right back to the same 412); |
| 158 | +- it leaves two write code paths (bulk for batches, per-file for |
| 159 | + singletons) — more code, more drift risk. |
| 160 | + |
| 161 | +Concretely, `pushSingleFile` no longer calls `s.client.WriteFile` for the |
| 162 | +create/update case. It enqueues into the bulk accumulator; the accumulator |
| 163 | +flushes via `WriteFilesBulk`; the read-back loop (strategy rule 4) |
| 164 | +refreshes revisions. `HTTPClient.WriteFile` stays on the struct for now |
| 165 | +(used by `revertReadonlyFile` and conflict recovery paths that already |
| 166 | +have a known revision), but its use shrinks to paths that always have a |
| 167 | +non-sentinel revision — so `If-Match: "0"` never hits the wire again. |
| 168 | + |
| 169 | +## 5. New test cases |
| 170 | + |
| 171 | +Added to `internal/mountsync/syncer_test.go`. Each test uses a fake |
| 172 | +`RemoteClient` that records calls; none hit the real httpapi server |
| 173 | +except where noted. |
| 174 | + |
| 175 | +1. **`TestBulkWrite_SingleCallForNFiles`** — local dir has 5 new files; |
| 176 | + after `SyncOnce`, the fake client records exactly one `WriteFilesBulk` |
| 177 | + call containing all 5 paths and zero `WriteFile` calls. |
| 178 | +2. **`TestBulkWrite_FirstWriteNoIfMatch`** — against the real |
| 179 | + `newMountsyncAPIHandler` (same pattern as `TestBulkSeedThenSync`, |
| 180 | + `internal/mountsync/syncer_test.go:441`), seed nothing, write one new |
| 181 | + file locally, `SyncOnce`, assert the file lands on the server with |
| 182 | + HTTP 202 and no 412 is observed in the response log. |
| 183 | +3. **`TestBulkWrite_ChunkAtThreshold`** — local dir has 300 files with |
| 184 | + `bulkFlushThreshold` forced to 256 via a test hook; assert exactly two |
| 185 | + `WriteFilesBulk` calls with sizes 256 and 44. |
| 186 | +4. **`TestBulkWrite_PartialErrors`** — fake server returns a |
| 187 | + `BulkWriteResponse` with `errorCount=1` and one `forbidden` entry; |
| 188 | + assert the successful paths update `state.Files[remotePath].Revision` |
| 189 | + (via the read-back) while the failed path retains its prior tracked |
| 190 | + state and logs a denial, and the next `SyncOnce` does **not** re-push |
| 191 | + the denied file (matches the existing `WriteDenied`/`DeniedHash` skip |
| 192 | + at `internal/mountsync/syncer.go:1406`). |
| 193 | +5. **`TestBulkWrite_DeletesStayPerFile`** — local dir drops a previously |
| 194 | + tracked file; assert `WriteFilesBulk` is NOT called for the delete and |
| 195 | + one `DeleteFile` call is issued with the tracked `If-Match` revision. |
| 196 | + |
| 197 | +## 6. Deliberately out of scope |
| 198 | + |
| 199 | +- Any change under `internal/httpapi/` or `internal/relayfile/` — the |
| 200 | + server-side `POST /fs/bulk` and `Store.BulkWrite` are already correct |
| 201 | + for our needs. |
| 202 | +- Bulk delete. No server endpoint; per-file `DeleteFile` stays. |
| 203 | +- FUSE / kernel mount layer. This migration is HTTP-client-only. |
| 204 | +- Fork-aware bulk writes (`BulkWriteFork`). Mountsync does not target |
| 205 | + forks. |
| 206 | +- Websocket event path (`websocketURL`, `syncer.go:1706`). Untouched. |
| 207 | +- Retry/backoff tuning in `HTTPClient.doJSON`. Reuse as-is. |
| 208 | +- Content encoding / binary handling beyond what |
| 209 | + `BulkWriteFile.Encoding` already carries. |
| 210 | + |
| 211 | +## 7. Acceptance gates |
| 212 | + |
| 213 | +Before commit, all of the following must pass locally from the repo root. |
| 214 | + |
| 215 | +```bash |
| 216 | +# 1. Package compiles and all mountsync tests pass, including new ones. |
| 217 | +go test ./internal/mountsync/... -count=1 |
| 218 | + |
| 219 | +# 2. Targeted run of the bulk-migration tests (must all pass, no skips). |
| 220 | +go test ./internal/mountsync/ -run 'TestBulkWrite_|TestBulkSeedThenSync' -count=1 -v |
| 221 | + |
| 222 | +# 3. No residual per-file write call inside the push loop. |
| 223 | +# Expect: zero matches. |
| 224 | +grep -n 's.client.WriteFile(' internal/mountsync/syncer.go | grep -v revertReadonlyFile | grep -v conflict |
| 225 | + |
| 226 | +# 4. WriteFilesBulk is wired into the interface and the HTTPClient. |
| 227 | +# Expect: at least one match in each of these files. |
| 228 | +grep -n 'WriteFilesBulk' internal/mountsync/syncer.go |
| 229 | +grep -n 'WriteFilesBulk' internal/mountsync/syncer_test.go |
| 230 | + |
| 231 | +# 5. No If-Match: "0" sentinel left on the wire. |
| 232 | +# Expect: zero matches. |
| 233 | +grep -nE '"If-Match":\s*"0"' internal/mountsync/ |
| 234 | + |
| 235 | +# 6. Wider suite still green (regression check on httpapi + relayfile). |
| 236 | +go test ./internal/httpapi/... ./internal/relayfile/... -count=1 |
| 237 | +``` |
| 238 | + |
| 239 | +Gate 3's filter is tolerant of the two legitimate remaining callers |
| 240 | +(`revertReadonlyFile`, conflict-recovery read-after-write); if either is |
| 241 | +renamed, update the grep alongside the rename in the same commit. |
| 242 | + |
| 243 | +BOUNDARY_READY |
0 commit comments