Skip to content

Commit aaf29b9

Browse files
brookscclaude
andcommitted
fix(security): shared validation, fail-closed JSON, staged prompt UI, tests for johannesjo#34/johannesjo#37/johannesjo#40
- Extract validateBranchName into electron/mcp/validation.ts; apply at MCP and REST boundaries (johannesjo#38) - Fail-closed on malformed .mcp.json in register.ts: parse/validate before writing temp config (johannesjo#42) - Show staged coordinator prompt text and countdown in read-only TaskPanel without "Take Control" (johannesjo#44) - Add getTaskDiff preamble-bearing file normalization tests (AGENTS.md + settings.local.json) (johannesjo#34) - Add closeTask IPC ordering tests: MCP_CoordinatedTaskClosed and MCP_CoordinatorDeregistered (johannesjo#37) - Add .mcp.json merge/cleanup tests via deregisterCoordinator (johannesjo#40) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent e75bc96 commit aaf29b9

9 files changed

Lines changed: 740 additions & 68 deletions

File tree

TODOS.md

Lines changed: 77 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Already resolved by TODO #14's lazy `getCoordinator()` pattern. All coordinator
4444

4545
---
4646

47-
### 31. Docker sub-tasks: one container per sub-task instead of docker exec
47+
### ~~31. Docker sub-tasks: one container per sub-task instead of docker exec~~ ✅ COMPLETE
4848

4949
**Current approach:** The coordinator spawns one `docker run` container and all sub-tasks run inside it via `docker exec`. This causes two known problems: HOME collision (#19, partially mitigated) and shaky process cleanup (#18, partially mitigated).
5050

@@ -64,15 +64,15 @@ Already resolved by TODO #14's lazy `getCoordinator()` pattern. All coordinator
6464

6565
---
6666

67-
### 32. Preamble injection uses synchronous file I/O on the main thread
67+
### ~~32. Preamble injection uses synchronous file I/O on the main thread~~ ✅ COMPLETE
6868

6969
**File:** `electron/mcp/coordinator.ts` (`createTask`, ~line 444–499)
7070
**What's wrong:** `createTask` uses `readFileSync`/`writeFileSync` for preamble injection into the sub-task worktree. This blocks the Electron main thread on every sub-task creation. With multiple concurrent sub-tasks, it can also race — two `create_task` calls reading the same file before either writes.
7171
**Done when:** Preamble injection uses `fs/promises` (`readFile`/`writeFile`) and either serializes writes per-path or is made safe for concurrent calls.
7272

7373
---
7474

75-
### 34. Preamble stripping can hide or silently delete legitimate task changes — P1
75+
### ~~34. Preamble stripping can hide or silently delete legitimate task changes — P1~~ ✅ COMPLETE
7676

7777
**Files:** `electron/mcp/coordinator.ts` (`getTaskDiff` ~line 760/777; `mergeTask` cleanup ~line 972)
7878
**What's wrong (two facets):**
@@ -102,7 +102,7 @@ Implementation constraints:
102102

103103
---
104104

105-
### 33. No integration tests for post-restart coordinator flow
105+
### ~~33. No integration tests for post-restart coordinator flow~~ ✅ COMPLETE
106106

107107
**Files:** `electron/mcp/coordinator.ts` (`hydrateTask`), `src/App.tsx` (restart restore path)
108108
**What's wrong:** `hydrateTask` restores output callbacks, `setMCPServerInfo` rewrites config files, and `StartMCPServer` is awaited before child hydration — but there are no tests exercising the full restart → re-subscribe → `wait_for_idle` / `wait_for_signal_done` round-trip. A regression in the restart path would be invisible.
@@ -111,7 +111,7 @@ Implementation constraints:
111111

112112
---
113113

114-
### 35. Missing tests for `StartMCPServer` IPC input validation
114+
### ~~35. Missing tests for `StartMCPServer` IPC input validation~~ ✅ COMPLETE
115115

116116
**File:** `electron/ipc/register-mcp.test.ts` (or new `electron/ipc/register.test.ts`)
117117
**What's wrong:** The `StartMCPServer` handler now validates renderer-supplied paths and IDs, but there are no tests exercising the rejection paths. A future refactor could accidentally remove or weaken the guards invisibly.
@@ -128,7 +128,7 @@ Implementation constraints:
128128

129129
---
130130

131-
### 36. Missing tests for hydrated `mcpConfigPath` directory scoping
131+
### ~~36. Missing tests for hydrated `mcpConfigPath` directory scoping~~ ✅ COMPLETE
132132

133133
**File:** `electron/mcp/coordinator.test.ts` (hydrateTask describe block)
134134
**What's wrong:** `hydrateTask` now validates `mcpConfigPath` against exact expected paths, but only the happy path is tested. Path-traversal and wrong-directory inputs could silently fall back to `undefined` and be mistaken for correct behavior.
@@ -143,7 +143,7 @@ Implementation constraints:
143143

144144
---
145145

146-
### 37. Missing tests for awaited coordinator cleanup ordering
146+
### ~~37. Missing tests for awaited coordinator cleanup ordering~~ ✅ COMPLETE
147147

148148
**File:** `src/store/tasks.test.ts`
149149
**What's wrong:** `MCP_CoordinatorDeregistered` and `MCP_CoordinatedTaskClosed` are now awaited before UI state is removed, but there are no tests asserting the order or handling a rejection.
@@ -157,7 +157,7 @@ Implementation constraints:
157157

158158
---
159159

160-
### 38. MCP/REST `baseBranch` bypasses IPC branch-name guard — P2
160+
### ~~38. MCP/REST `baseBranch` bypasses IPC branch-name guard~~ ✅ COMPLETE
161161

162162
**Files:** `electron/mcp/server.ts:69`, `electron/remote/server.ts:291`, `electron/ipc/git.ts:705`
163163
**What's wrong:** Normal IPC validates `baseBranch` before it reaches git. The MCP tool handler and REST `POST /api/tasks` only check that `baseBranch` is a string, then pass it into coordinator task creation and eventually git. `execFile` avoids shell injection, but git can interpret option-looking refs strangely, and this creates an inconsistency between UI-created tasks and coordinator-created tasks.
@@ -187,7 +187,7 @@ Implementation constraints:
187187

188188
---
189189

190-
### 39. No Docker coordinator child-close isolation test
190+
### ~~39. No Docker coordinator child-close isolation test~~ ✅ COMPLETE
191191

192192
**File:** `electron/mcp/coordinator.test.ts`
193193
**What's wrong:** When one of two running sub-tasks is closed, cleanup should target only that child's process/config. There is no test asserting that closing child A does not affect child B's config file or state.
@@ -196,7 +196,7 @@ Implementation constraints:
196196

197197
---
198198

199-
### 40. No `.mcp.json` merge/cleanup test
199+
### ~~40. No `.mcp.json` merge/cleanup test~~ ✅ COMPLETE
200200

201201
**File:** `electron/ipc/register-mcp.test.ts` (or `electron/mcp/coordinator.test.ts`)
202202
**What's wrong:** The `.mcp.json` read-before-write logic merges only the `parallel-code` key and preserves other servers, but this is untested. A regression could silently destroy user-configured MCP servers on coordinator startup.
@@ -214,7 +214,7 @@ Implementation constraints:
214214

215215
---
216216

217-
### 41. Coordinator cleanup reports success even when worktree/branch deletion fails — P2
217+
### ~~41. Coordinator cleanup reports success even when worktree/branch deletion fails~~ ✅ COMPLETE
218218

219219
**File:** `electron/mcp/coordinator.ts` (`cleanupTask` ~line 1030; Docker inner-process cleanup ~line 1074)
220220
**What's wrong:** `cleanupTask` catches `deleteTask` failure, logs a warning, then removes backend state and emits `MCP_TaskClosed`. Docker inner-process cleanup is also fire-and-forget, so worktree deletion can race an agent process that is still alive. The UI and backend believe the task is cleanly closed while the worktree, branch, or Docker process may still exist.
@@ -238,7 +238,7 @@ Implementation constraints:
238238

239239
---
240240

241-
### 42. Malformed existing `.mcp.json` is silently overwritten — P3
241+
### ~~42. Malformed existing `.mcp.json` is silently overwritten~~ ✅ COMPLETE
242242

243243
**File:** `electron/ipc/register.ts` (~line 1406)
244244
**What's wrong:** If `.mcp.json` exists but cannot be parsed (malformed JSON), the handler treats it as empty and writes a new file. This silently destroys a user's malformed-but-recoverable MCP config — other servers they have configured are gone.
@@ -257,7 +257,7 @@ Implementation constraints:
257257

258258
---
259259

260-
### 43. Restore failure leaves coordinator tasks permanently unspawned — P2/P3
260+
### ~~43. Restore failure leaves coordinator tasks permanently unspawned~~ ✅ COMPLETE
261261

262262
**Files:** `src/App.tsx` (~line 349), `src/components/TerminalView.tsx` (~line 637)
263263
**What's wrong:** On app restore, coordinator and coordinated task PTYs are gated on `mcpReady`. If `StartMCPServer` or `MCP_HydrateCoordinatedTask` fails (bad persisted path, config write error, Docker path issue, transient startup failure), the error is only logged and `mcpReady` is never set to `true`. `TerminalView` then waits indefinitely — the agent process is never spawned and there is no user-visible error or retry path. The task appears stuck/dead.
@@ -282,7 +282,7 @@ Implementation constraints:
282282

283283
---
284284

285-
### 44. Staged coordinator prompt is not visible unless the user takes control — UX
285+
### ~~44. Staged coordinator prompt is not visible unless the user takes control~~ ✅ COMPLETE
286286

287287
**Files:** Coordinator notification UI (staged notification section, `src/components/`)
288288
**What's wrong:** When the coordinator is driving, the staged-notification section collapses. If a prompt has been queued for autofire and the user wants to see it, they must click "Take Control" to expand the section — which is a heavyweight action taken only for visibility. The user shouldn't need to claim control just to read what's pending.
@@ -292,9 +292,67 @@ Implementation constraints:
292292

293293
---
294294

295-
### 45. `wait_for_signal_done` network errors silently stall the coordinator — P1
295+
### ~~45. `wait_for_signal_done` network retry must be replay-safe~~ ✅ COMPLETE
296296

297-
**Files:** `electron/mcp/coordinator.ts` (`waitForSignalDone`), `electron/remote/server.ts` (long-poll endpoint)
298-
**What's wrong:** `wait_for_signal_done` long-polls the local HTTP server. If the request fails with a network error (connection reset, transient local socket error), the MCP tool call rejects and the coordinator loses the completion signal — even though the sub-task already called `signal_done` successfully. The coordinator then stalls indefinitely, unaware the task is done, until a human intervenes.
299-
**Fix direction:** On network error, retry `wait_for_signal_done` automatically with a short backoff rather than propagating the rejection. Since `signal_done` writes durable state server-side (task status), a retry will immediately return the already-signaled result. Cap retries (e.g., 5 attempts over 30 s) and only surface a hard failure if the server is genuinely unreachable for an extended period. Alternatively, expose a `get_task` status poll as a fallback so the coordinator can recover by checking task state directly.
300-
**Done when:** A transient `fetch failed` on `wait_for_signal_done` is retried transparently and the coordinator receives the signal-done result without human intervention.
297+
**Files:** `electron/mcp/client.ts` (`waitForSignalDone`), `electron/mcp/coordinator.ts` (`signalDone`, `waitForSignalDone`), `electron/remote/server.ts` (`/api/wait-signal`)
298+
**What's wrong:** Retrying the MCP client call on a network `TypeError` is not enough by itself. If the first long-poll receives `signal_done`, the backend currently marks `task.signalDoneConsumed = true` before the HTTP response reaches the MCP process. If the connection drops after that consumption but before the client receives the body, the retry finds no unconsumed signal and can block until timeout. The signal was durable, but the delivery result was not replayable.
299+
300+
Implementation constraints:
301+
302+
- Make `wait_for_signal_done` consumption idempotent across transport failure. Either add a request/client id and replay the result for that id, or keep a short-lived last-delivered result per coordinator/task that a retry can return.
303+
- Preserve existing semantics: HTTP 4xx/5xx application errors should not be retried as network failures.
304+
- Keep retry bounded by the original `timeoutMs`; do not let backoff extend a caller's requested wait indefinitely.
305+
- Do not restage duplicate UI notifications or decrement/alter `remaining` twice when a replay happens.
306+
- Log enough context to distinguish "new signal consumed" from "previous signal replayed".
307+
308+
**Tests to add:**
309+
310+
- Active waiter receives `signal_done`, server resolves the waiter, client sees a simulated network `TypeError`, retry returns the same task result immediately.
311+
- Replay does not double-count `remaining`.
312+
- Replay does not re-stage or auto-fire duplicate coordinator notifications.
313+
- HTTP 4xx/5xx failures are not retried.
314+
- Repeated network `TypeError`s stop after the configured retry/timeout boundary.
315+
316+
**Done when:** A transient response-loss/network failure after backend signal consumption still returns the consumed signal to the coordinator exactly once, without human intervention or duplicate UI side effects.
317+
318+
---
319+
320+
### 46. Comprehensive coordinator regression test suite — P1/P2
321+
322+
**Goal:** Add tests around coordinator invariants so future changes cannot regress PR #100 behavior. Prefer real temporary git repos for diff/merge behavior and focused unit tests for pure validation/state-machine helpers. Do not rely only on mocked git output for preamble diff correctness.
323+
324+
**Priority 1 — prevent known P1 regressions:**
325+
326+
- `wait_for_signal_done` replay/idempotency: cover response loss after backend consumption, retry replay, no duplicate `remaining`, no duplicate notifications, HTTP errors not retried, and bounded network retry.
327+
- Normalized preamble diff: use a real temp git repo and verify `baseBranch` undefined uses the same detected-main diff base as `getAllFileDiffs`, not `HEAD`.
328+
- Preamble-bearing files: preamble-only changes are hidden; legitimate edits before the generated block are shown; legitimate edits after the generated block are shown; edits on both sides are shown.
329+
- Preamble merge cleanup: merge strips only the generated block and preserves legitimate edits before/after it.
330+
- `.claude/settings.local.json`: preserves unrelated keys, preserves pre-existing non-generated `systemPrompt`, removes only the generated block, deletes `systemPrompt` only when empty, and does not silently rewrite malformed JSON.
331+
332+
**Priority 2 — lifecycle and recovery coverage:**
333+
334+
- Restart/hydration: app restart rewrites child MCP config with fresh `subtaskToken`, never the stale or coordinator token.
335+
- Restart/hydration: `hydrateTask` restores `wait_for_idle` and unconsumed `signal_done`.
336+
- MCP startup failure: failed `StartMCPServer` / child hydration marks a visible error instead of leaving `TerminalView` waiting forever.
337+
- MCP retry: retry transitions `error -> pending -> ready`; child retry requires parent coordinator readiness and surfaces that dependency when missing.
338+
- Cleanup failure: `deleteTask` failure does not emit `MCP_TaskClosed`, retains backend state for retry, and marks the renderer task recoverable.
339+
- Merge cleanup failure: represent "merge succeeded, cleanup failed" separately from merge failure.
340+
341+
**Priority 3 — boundary/security coverage:**
342+
343+
- MCP/REST/IP C validation parity: `baseBranch` rejects empty strings, whitespace-only values, leading `-`, and control characters through public boundaries; valid refs still work.
344+
- Token/task scoping: coordinator token cannot access another coordinator's tasks; subtask token can only call `signal_done`; subtask token cannot use websocket.
345+
- Remote/mobile scoping: missing `X-Coordinator-Id` behavior is explicitly tested only for routes that are intended to support unscoped remote/mobile access.
346+
- Docker isolation: closing child A does not stop child B, delete child B's config, or remove child B from `listTasks()`.
347+
- Docker reachability: Docker MCP URL remains reachable from containers, while non-Docker mode does not widen bind address unless the owner-approved design requires it.
348+
349+
**Recommended structure:**
350+
351+
- Pure helper tests for branch validation, preamble-block removal, network-error retry classification, and replay-cache behavior.
352+
- Real temp git repo tests for `getTaskDiff()` and merge cleanup.
353+
- Unit tests for coordinator state transitions and renderer store error states.
354+
- One golden coordinator flow sequence test: `create_task -> wait_for_idle -> signal_done -> wait_for_signal_done -> get_task_diff -> review_and_merge -> cleanup`.
355+
- One restart sequence test: `persist -> reload -> StartMCPServer -> hydrate children -> respawn -> signal_done`.
356+
- Keep Docker runtime tests opt-in with `RUN_DOCKER_MCP_TEST=1`, but keep command-construction and state-isolation tests in the normal suite.
357+
358+
**Done when:** The suite fails for the known replay/idempotency and normalized-diff-base bugs, passes after those fixes, and covers the coordinator lifecycle from creation through restart, review/merge, and cleanup.

0 commit comments

Comments
 (0)