You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: TODOS.md
+77-19Lines changed: 77 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ Already resolved by TODO #14's lazy `getCoordinator()` pattern. All coordinator
44
44
45
45
---
46
46
47
-
### 31. Docker sub-tasks: one container per sub-task instead of docker exec
47
+
### ~~31. Docker sub-tasks: one container per sub-task instead of docker exec~~ ✅ COMPLETE
48
48
49
49
**Current approach:** The coordinator spawns one `docker run` container and all sub-tasks run inside it via `docker exec`. This causes two known problems: HOME collision (#19, partially mitigated) and shaky process cleanup (#18, partially mitigated).
50
50
@@ -64,15 +64,15 @@ Already resolved by TODO #14's lazy `getCoordinator()` pattern. All coordinator
64
64
65
65
---
66
66
67
-
### 32. Preamble injection uses synchronous file I/O on the main thread
67
+
### ~~32. Preamble injection uses synchronous file I/O on the main thread~~ ✅ COMPLETE
**What's wrong:**`createTask` uses `readFileSync`/`writeFileSync` for preamble injection into the sub-task worktree. This blocks the Electron main thread on every sub-task creation. With multiple concurrent sub-tasks, it can also race — two `create_task` calls reading the same file before either writes.
71
71
**Done when:** Preamble injection uses `fs/promises` (`readFile`/`writeFile`) and either serializes writes per-path or is made safe for concurrent calls.
72
72
73
73
---
74
74
75
-
### 34. Preamble stripping can hide or silently delete legitimate task changes — P1
75
+
### ~~34. Preamble stripping can hide or silently delete legitimate task changes — P1~~ ✅ COMPLETE
**What's wrong:**`hydrateTask` restores output callbacks, `setMCPServerInfo` rewrites config files, and `StartMCPServer` is awaited before child hydration — but there are no tests exercising the full restart → re-subscribe → `wait_for_idle` / `wait_for_signal_done` round-trip. A regression in the restart path would be invisible.
@@ -111,7 +111,7 @@ Implementation constraints:
111
111
112
112
---
113
113
114
-
### 35. Missing tests for `StartMCPServer` IPC input validation
**File:**`electron/ipc/register-mcp.test.ts` (or new `electron/ipc/register.test.ts`)
117
117
**What's wrong:** The `StartMCPServer` handler now validates renderer-supplied paths and IDs, but there are no tests exercising the rejection paths. A future refactor could accidentally remove or weaken the guards invisibly.
@@ -128,7 +128,7 @@ Implementation constraints:
128
128
129
129
---
130
130
131
-
### 36. Missing tests for hydrated `mcpConfigPath` directory scoping
**What's wrong:**`hydrateTask` now validates `mcpConfigPath` against exact expected paths, but only the happy path is tested. Path-traversal and wrong-directory inputs could silently fall back to `undefined` and be mistaken for correct behavior.
@@ -143,7 +143,7 @@ Implementation constraints:
143
143
144
144
---
145
145
146
-
### 37. Missing tests for awaited coordinator cleanup ordering
**What's wrong:**`MCP_CoordinatorDeregistered` and `MCP_CoordinatedTaskClosed` are now awaited before UI state is removed, but there are no tests asserting the order or handling a rejection.
**What's wrong:** Normal IPC validates `baseBranch` before it reaches git. The MCP tool handler and REST `POST /api/tasks` only check that `baseBranch` is a string, then pass it into coordinator task creation and eventually git. `execFile` avoids shell injection, but git can interpret option-looking refs strangely, and this creates an inconsistency between UI-created tasks and coordinator-created tasks.
@@ -187,7 +187,7 @@ Implementation constraints:
187
187
188
188
---
189
189
190
-
### 39. No Docker coordinator child-close isolation test
190
+
### ~~39. No Docker coordinator child-close isolation test~~ ✅ COMPLETE
191
191
192
192
**File:**`electron/mcp/coordinator.test.ts`
193
193
**What's wrong:** When one of two running sub-tasks is closed, cleanup should target only that child's process/config. There is no test asserting that closing child A does not affect child B's config file or state.
@@ -196,7 +196,7 @@ Implementation constraints:
196
196
197
197
---
198
198
199
-
### 40. No `.mcp.json` merge/cleanup test
199
+
### ~~40. No `.mcp.json` merge/cleanup test~~ ✅ COMPLETE
**What's wrong:** The `.mcp.json` read-before-write logic merges only the `parallel-code` key and preserves other servers, but this is untested. A regression could silently destroy user-configured MCP servers on coordinator startup.
@@ -214,7 +214,7 @@ Implementation constraints:
214
214
215
215
---
216
216
217
-
### 41. Coordinator cleanup reports success even when worktree/branch deletion fails — P2
217
+
### ~~41. Coordinator cleanup reports success even when worktree/branch deletion fails~~ ✅ COMPLETE
**What's wrong:**`cleanupTask` catches `deleteTask` failure, logs a warning, then removes backend state and emits `MCP_TaskClosed`. Docker inner-process cleanup is also fire-and-forget, so worktree deletion can race an agent process that is still alive. The UI and backend believe the task is cleanly closed while the worktree, branch, or Docker process may still exist.
@@ -238,7 +238,7 @@ Implementation constraints:
238
238
239
239
---
240
240
241
-
### 42. Malformed existing `.mcp.json` is silently overwritten — P3
241
+
### ~~42. Malformed existing `.mcp.json` is silently overwritten~~ ✅ COMPLETE
242
242
243
243
**File:**`electron/ipc/register.ts` (~line 1406)
244
244
**What's wrong:** If `.mcp.json` exists but cannot be parsed (malformed JSON), the handler treats it as empty and writes a new file. This silently destroys a user's malformed-but-recoverable MCP config — other servers they have configured are gone.
**What's wrong:** On app restore, coordinator and coordinated task PTYs are gated on `mcpReady`. If `StartMCPServer` or `MCP_HydrateCoordinatedTask` fails (bad persisted path, config write error, Docker path issue, transient startup failure), the error is only logged and `mcpReady` is never set to `true`. `TerminalView` then waits indefinitely — the agent process is never spawned and there is no user-visible error or retry path. The task appears stuck/dead.
@@ -282,7 +282,7 @@ Implementation constraints:
282
282
283
283
---
284
284
285
-
### 44. Staged coordinator prompt is not visible unless the user takes control — UX
285
+
### ~~44. Staged coordinator prompt is not visible unless the user takes control~~ ✅ COMPLETE
**What's wrong:** When the coordinator is driving, the staged-notification section collapses. If a prompt has been queued for autofire and the user wants to see it, they must click "Take Control" to expand the section — which is a heavyweight action taken only for visibility. The user shouldn't need to claim control just to read what's pending.
**What's wrong:**`wait_for_signal_done` long-polls the local HTTP server. If the request fails with a network error (connection reset, transient local socket error), the MCP tool call rejects and the coordinator loses the completion signal — even though the sub-task already called `signal_done` successfully. The coordinator then stalls indefinitely, unaware the task is done, until a human intervenes.
299
-
**Fix direction:** On network error, retry `wait_for_signal_done` automatically with a short backoff rather than propagating the rejection. Since `signal_done` writes durable state server-side (task status), a retry will immediately return the already-signaled result. Cap retries (e.g., 5 attempts over 30 s) and only surface a hard failure if the server is genuinely unreachable for an extended period. Alternatively, expose a `get_task` status poll as a fallback so the coordinator can recover by checking task state directly.
300
-
**Done when:** A transient `fetch failed` on `wait_for_signal_done` is retried transparently and the coordinator receives the signal-done result without human intervention.
**What's wrong:** Retrying the MCP client call on a network `TypeError` is not enough by itself. If the first long-poll receives `signal_done`, the backend currently marks `task.signalDoneConsumed = true` before the HTTP response reaches the MCP process. If the connection drops after that consumption but before the client receives the body, the retry finds no unconsumed signal and can block until timeout. The signal was durable, but the delivery result was not replayable.
299
+
300
+
Implementation constraints:
301
+
302
+
- Make `wait_for_signal_done` consumption idempotent across transport failure. Either add a request/client id and replay the result for that id, or keep a short-lived last-delivered result per coordinator/task that a retry can return.
303
+
- Preserve existing semantics: HTTP 4xx/5xx application errors should not be retried as network failures.
304
+
- Keep retry bounded by the original `timeoutMs`; do not let backoff extend a caller's requested wait indefinitely.
305
+
- Do not restage duplicate UI notifications or decrement/alter `remaining` twice when a replay happens.
306
+
- Log enough context to distinguish "new signal consumed" from "previous signal replayed".
307
+
308
+
**Tests to add:**
309
+
310
+
- Active waiter receives `signal_done`, server resolves the waiter, client sees a simulated network `TypeError`, retry returns the same task result immediately.
311
+
- Replay does not double-count `remaining`.
312
+
- Replay does not re-stage or auto-fire duplicate coordinator notifications.
313
+
- HTTP 4xx/5xx failures are not retried.
314
+
- Repeated network `TypeError`s stop after the configured retry/timeout boundary.
315
+
316
+
**Done when:** A transient response-loss/network failure after backend signal consumption still returns the consumed signal to the coordinator exactly once, without human intervention or duplicate UI side effects.
317
+
318
+
---
319
+
320
+
### 46. Comprehensive coordinator regression test suite — P1/P2
321
+
322
+
**Goal:** Add tests around coordinator invariants so future changes cannot regress PR #100 behavior. Prefer real temporary git repos for diff/merge behavior and focused unit tests for pure validation/state-machine helpers. Do not rely only on mocked git output for preamble diff correctness.
323
+
324
+
**Priority 1 — prevent known P1 regressions:**
325
+
326
+
-`wait_for_signal_done` replay/idempotency: cover response loss after backend consumption, retry replay, no duplicate `remaining`, no duplicate notifications, HTTP errors not retried, and bounded network retry.
327
+
- Normalized preamble diff: use a real temp git repo and verify `baseBranch` undefined uses the same detected-main diff base as `getAllFileDiffs`, not `HEAD`.
328
+
- Preamble-bearing files: preamble-only changes are hidden; legitimate edits before the generated block are shown; legitimate edits after the generated block are shown; edits on both sides are shown.
329
+
- Preamble merge cleanup: merge strips only the generated block and preserves legitimate edits before/after it.
330
+
-`.claude/settings.local.json`: preserves unrelated keys, preserves pre-existing non-generated `systemPrompt`, removes only the generated block, deletes `systemPrompt` only when empty, and does not silently rewrite malformed JSON.
331
+
332
+
**Priority 2 — lifecycle and recovery coverage:**
333
+
334
+
- Restart/hydration: app restart rewrites child MCP config with fresh `subtaskToken`, never the stale or coordinator token.
335
+
- Restart/hydration: `hydrateTask` restores `wait_for_idle` and unconsumed `signal_done`.
336
+
- MCP startup failure: failed `StartMCPServer` / child hydration marks a visible error instead of leaving `TerminalView` waiting forever.
337
+
- MCP retry: retry transitions `error -> pending -> ready`; child retry requires parent coordinator readiness and surfaces that dependency when missing.
338
+
- Cleanup failure: `deleteTask` failure does not emit `MCP_TaskClosed`, retains backend state for retry, and marks the renderer task recoverable.
- MCP/REST/IP C validation parity: `baseBranch` rejects empty strings, whitespace-only values, leading `-`, and control characters through public boundaries; valid refs still work.
344
+
- Token/task scoping: coordinator token cannot access another coordinator's tasks; subtask token can only call `signal_done`; subtask token cannot use websocket.
345
+
- Remote/mobile scoping: missing `X-Coordinator-Id` behavior is explicitly tested only for routes that are intended to support unscoped remote/mobile access.
346
+
- Docker isolation: closing child A does not stop child B, delete child B's config, or remove child B from `listTasks()`.
347
+
- Docker reachability: Docker MCP URL remains reachable from containers, while non-Docker mode does not widen bind address unless the owner-approved design requires it.
348
+
349
+
**Recommended structure:**
350
+
351
+
- Pure helper tests for branch validation, preamble-block removal, network-error retry classification, and replay-cache behavior.
352
+
- Real temp git repo tests for `getTaskDiff()` and merge cleanup.
353
+
- Unit tests for coordinator state transitions and renderer store error states.
354
+
- One golden coordinator flow sequence test: `create_task -> wait_for_idle -> signal_done -> wait_for_signal_done -> get_task_diff -> review_and_merge -> cleanup`.
355
+
- One restart sequence test: `persist -> reload -> StartMCPServer -> hydrate children -> respawn -> signal_done`.
356
+
- Keep Docker runtime tests opt-in with `RUN_DOCKER_MCP_TEST=1`, but keep command-construction and state-isolation tests in the normal suite.
357
+
358
+
**Done when:** The suite fails for the known replay/idempotency and normalized-diff-base bugs, passes after those fixes, and covers the coordinator lifecycle from creation through restart, review/merge, and cleanup.
0 commit comments