Skip to content

feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100

Open
brooksc wants to merge 116 commits into
johannesjo:mainfrom
brooksc:feature/orchestrator-control-v2
Open

feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100
brooksc wants to merge 116 commits into
johannesjo:mainfrom
brooksc:feature/orchestrator-control-v2

Conversation

@brooksc
Copy link
Copy Markdown
Contributor

@brooksc brooksc commented May 4, 2026

Overview

This PR builds on the initial coordinating agent implementation by @cledoux95 (PR #31) with substantial fixes and new capabilities.

The core motivation: I wanted to use a coordinator agent to drive parallel workstreams, but still be able to monitor each sub-task, answer questions when an agent gets stuck, and take over a task when I want to make a decision myself — then hand it back. Think of it like a pilot/co-pilot handoff: explicit, visible, and safe.


Credit

The foundation of this PR is @cledoux95's work in #31, which introduced the coordinating agent concept, the MCP server/client, and the create_task / send_prompt / wait_for_idle / merge_task tools. This PR would not exist without that work.


What's new / changed

Sub-task visibility (addresses the core UX gap)

In #31, sub-tasks created by the coordinator were not visible in the sidebar. The coordinator agent was essentially working in the dark. Now each sub-task spawned via create_task appears as its own sidebar panel, giving you full terminal visibility into what every agent is doing — exactly like a manually created task.

Pilot/co-pilot control handoff

Each coordinated sub-task shows a banner indicating who is driving:

  • "Orchestrator driving" (subtle grey bar) — the coordinator agent has control; you can observe but the agent is running
  • "You have control — orchestrator is paused" (amber warning bar) — you've taken over; the coordinator cannot send further prompts until you explicitly return control

You can click "Take Control" at any time to pause the coordinator for that task, interact with the agent yourself, then click "Return to Orchestrator" when done.

Expanded create_task options

The coordinator agent can now specify:

  • skipPermissions: true — passes --dangerously-skip-permissions to the sub-agent so it runs fully autonomously without tool-approval interruptions
  • gitIsolation: "worktree" | "direct" | "none" — controls git isolation mode (defaults to "worktree" as before)

Fixes to #31

Several bugs were found and fixed while integrating and testing:

Issue Fix
Hardcoded port 7777 — if the port was in use, the MCP server silently failed findFreePort(7777, 7800) tries ports sequentially until one is free
Missing --mcp-config arg — the coordinator agent was spawned without the MCP config flag, so it had no MCP tools and fell back to bash orchestration TaskAITerminal now passes --mcp-config <path> when task.mcpConfigPath is set
Double-spawn — the orchestrator backend called spawnAgent directly AND the renderer's TerminalView also called it, killing the backend spawn and losing the output subscription Removed backend spawnAgent call; orchestrator now uses onPtyEvent('spawn', ...) to subscribe to output each time the renderer spawns the agent — survives restarts too
MCP server not restarted after app restart — persisted coordinator tasks had no server to connect to after restart, causing fetch failed on all MCP tool calls On startup, App.tsx now awaits StartMCPServer for each persisted coordinator task before the agents are spawned, refreshing the config file with the new port/token
deleteTask wrong call signature — the orchestrator called deleteTask with positional args but the function now takes an options object Fixed to deleteTask({ agentIds, branchName, deleteBranch, projectRoot })
Missing REST input validation — the orchestrator API routes accepted arbitrary input Added type guards returning 400 for invalid name, prompt, projectId, skipPermissions, gitIsolation
waitForIdle hangs when human takes controlsetTaskControl('human') left pending waiters stuck until timeout setTaskControl now unblocks pending waitForIdle callers on any control change
mcp_control_changed missing from preload allowlist Added to ALLOWED_CHANNELS in preload.cjs
Better connection error messagesfetch failed gave no actionable info MCPClient now wraps network errors: "Cannot reach Parallel Code at http://127.0.0.1:PORT. Is the app running?"

Tests

Added 25 new tests across two files:

  • electron/mcp/prompt-detect.test.tsstripAnsi and chunkContainsAgentPrompt (the core of waitForIdle idle detection)
  • electron/mcp/orchestrator.test.ts — control handoff logic: sendPrompt blocked/unblocked, waitForIdle behavior under human control, PTY exit propagation

The tests caught the waitForIdle hang bug described above before it shipped.


Test plan

  • Create a new task with Coordinator mode enabled
  • Coordinator agent spawns sub-tasks; each appears as a new sidebar panel
  • Sub-task panels show the "Orchestrator driving" banner
  • Click Take Control — banner switches to amber "You have control"
  • Coordinator cannot send prompts while human has control (MCP tool returns error)
  • Click Return to Orchestrator — coordinator resumes
  • Kill and relaunch the app; open a coordinator task — MCP tools still work (no fetch failed)
  • create_task with skipPermissions: true spawns sub-agent without permission prompts
  • npm run check && npm test — all pass

🤖 Generated with Claude Code


Review response (round 2)

Addressed all issues from @johannesjo's second review pass (b55dcc9..c80a029):

Security (P1)

Issue Fix
Coordinator token scoping bypassable — omitting X-Coordinator-Id gave unscoped access to all tasks Coordinator-class tokens now require a valid X-Coordinator-Id header on task routes; missing or unregistered header → 403
signal_done not task-scoped — any sub-task with subtaskToken could signal done for any other task Each sub-task now receives a unique doneToken in its MCP config; must be sent as X-Done-Token header on /done; wrong or missing token → 403
Atomic writes fail across filesystems (EXDEV) — temp file in os.tmpdir() threw on rename() when /tmp and the target are on different mounts Temp file now written to dirname(filePath) so it's always on the same mount
Retry after MCP error doesn't spawn PTY — ready-state watcher was only installed when initial status was pending; clicking Retry from error did nothing Watcher is now always installed for tasks with MCP lifecycle so Retry transitions through error → ready correctly
Child hydration ignores coordinator restore failurePromise.allSettled + unconditional loop hydrated children even when coordinator MCP restore failed Children are now skipped if their coordinator's mcpStartupStatus !== 'ready'; hydrateTask throws if coordinator is unregistered

Correctness (P2)

Issue Fix
Custom Docker image not propagated on new task / retryStartMCPServer calls in createTask and retryTaskMcpStartup omitted dockerImage dockerImage is now forwarded in both call sites
Remote access after MCP-only start advertises unreachable URLs — server started by StartMCPServer binds to 127.0.0.1; StartRemoteServer now stops and restarts on 0.0.0.0 if no coordinator is active Prevents advertising WiFi/Tailscale URLs that can't reach a loopback-only listener
Existing .mcp.json parallel-code entry overwritten then deleted — startup overwrote any pre-existing entry; deregistration always deleted it Previous entry saved in CoordinatorState.previousMcpParallelCode at write time and restored on deregistration

Minor

  • semgrep / gitleaks scripts now check whether the CLI is installed and print install instructions (brew install semgrep, etc.) rather than failing with a cryptic error; these are system tools and cannot be in devDependencies

@brooksc brooksc force-pushed the feature/orchestrator-control-v2 branch from 2e523b0 to b436f19 Compare May 4, 2026 03:27
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 5, 2026

@cledoux95 as you initiated this I'd love to get your thoughts on my building further on your work.
@johannesjo this PR as is, is useful but I don't think it's ready yet.

Here's what I'm trying to solve for.

  • I have a medium size project where I'm using (surprise!) parallel claude code to build it
  • I'm using backlog.md to build up a healthy list of items that I curate.
  • Prior to this I'd switch to the browser to view backlog.md's kanban, copy a TASK-###, switch to parallel code, open a new task and paste it in. And repeat until I had 3-5 tasks running.

What I'm hoping to get to:

  1. I have the coordinator agent reviewing what's on the backlog and determining what's next.
  2. It fires off up to X (e.g. 5) paralell claude codes
  3. I can see them and interact with them, take control from the claude code, interact with it and return control
  4. The coordinator monitors for when an agent is done. Flags to me when there are questions I need to address. Review's it's final results and then tells it to commit and rebase.

1-3 is working here (although the take control/return control may need some work)
4 -- the agent has to be prompted to "go check on the agents" and it then does the commit/rebase nicely and closes the window.

I'm thinking I need some webhook or way for agentN to notify the coordinator when it's done (hook...?)

Anyway... this may take some time to polish. @johannesjo What do you think about adding a "Beta" section in settings, when you're ready to integrate it - it can live in Beta for a little while so others can opt-in and shake it out if they want. I think it'll take some time to live with this to continue to refine it. I'm using it myself going forward.

@johannesjo
Copy link
Copy Markdown
Owner

johannesjo commented May 6, 2026

Thank you very much @brooksc (and @cledoux95 !) !! This is impressive work, and the integration fixes on top of #31 are exactly the kind of stuff that's painful to discover from the outside. I want to ship this, but I agree with you that it's not ready as a default-on feature, and I think your "Beta in Settings" instinct is the right framing. Let me make that concrete.

Yes to beta-gating, with a real flag. Specifically:

  • A single experimental.coordinatorMode setting, off by default.
  • When off, the MCP server module is not started (lazy import() on flag flip), the Coordinator option in NewTaskDialog is hidden, and the mcp_control_changed IPC channel isn't exposed to the renderer. Goal: zero footprint for users who don't opt in.
  • Coordinator-specific code lives in its own folders (electron/mcp/, src/components/coordinator/, a dedicated store slice), and the cross-cutting touches (TaskPanel, Sidebar) take a coordinator-aware prop rather than inlining the logic. This keeps future unrelated PRs from having to reason about coordinator state.

Three things I'd want addressed before I merge, even into beta:

  1. Post-restart MCP path – please add an integration test that confirms the port/token rotation actually rewrites the config the agent reads on next launch. That's the most fragile part by design and the unit tests don't cover it.
  2. skipPermissions guardrail--dangerously-skip-permissions is a real footgun. I'd like it to require both the beta flag and an explicit per-task confirmation in the UI, not just an MCP tool argument the coordinator can pass.
  3. Feature flag wraps startup + IPC registration, not just UI affordances – so toggling it off genuinely removes the surface area.

On your unsolved step #4 (coordinator noticing "agent done"): I think Claude Code's Stop / SubagentStop hooks are the right primitive here – a hook posting back to the local MCP server would replace the polling/manual-nudge loop with a real notify channel. Worth exploring before we promote out of beta; not a blocker for landing the beta itself.

Graduation criteria I'd want to hit before flipping the flag on by default:

  • Restart-with-running-coordinator works reliably across macOS and Linux.
  • No leaked PTY/zombie processes over a 24h session.
  • The "agent finished" notification is solved (hooks or equivalent).
  • A few weeks in beta with no P0 bugs from opt-in users.

If you're up for the gating work I'm happy to keep this open and help where useful. And to be clear: I do want to ship this – the pilot/co-pilot pattern is the right shape for human-in-the-loop orchestration, and you've already shaved off a lot of the rakes.

@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 7, 2026

@johannesjo — thanks for the detailed review, this is exactly the kind of feedback that makes it worth sharing early. Lots to respond to.


Where things stand: working e2e

Since the original PR I've built considerably more on top. Here's what's confirmed working end-to-end:

Coordinator creation & MCP tooling

  • "Coordinator mode" checkbox in New Task dialog; only one active coordinator per project enforced
  • MCP server starts automatically; coordinator agent sees: create_task, list_tasks, get_task_status, send_prompt, wait_for_idle, wait_for_signal_done, get_task_diff, get_task_output, merge_task, close_task
  • Sub-tasks get only signal_done (not create_task etc.) so there's no runaway recursion

Sub-task lifecycle

  • create_task spawns a worktree + agent, injects a [SUB-TASK MODE] preamble so the agent knows its role and constraints
  • Sub-task inherits the coordinator's agent command, args, and --dangerously-skip-permissions state
  • merge_task and close_task work cleanly, including worktree cleanup

Coordinator ↔ sub-agent notification — and why we didn't use Stop hooks

This is where we diverge from your suggestion, and I think for good reason. You suggested Stop/SubagentStop hooks posting back to the MCP server. The problem: Claude Code fires Stop any time the agent pauses — including when it hits a question mid-task or runs out of context. What we actually want is "agent has finished its assigned work and is ready for review."

Instead, sub-agents call the signal_done MCP tool explicitly when they're done. The coordinator receives a staged notification into its PromptInput textarea with a summary of which tasks completed, which branch to review, and whether there was a non-zero exit. The notification auto-fires after a quiet period (default 60s, faster on error) if the coordinator is idle — so the coordinator agent processes it automatically without manual nudging. This is the step 4 from the original description working.

Additional notification paths:

  • If the coordinator calls send_prompt while the user has taken control of a sub-task, it gets an error immediately (not a silent hang). When the user returns control, the coordinator gets a "Task X is back under your control" notification so it can retry without human prompting.
  • If the user closes a sub-task before its prompt even lands, the coordinator still gets notified — previously it would silently think 2 tasks were running when only 1 was.

Pilot / co-pilot control handoff — how it works and two open items

Each sub-task panel shows a banner indicating who's driving. When the coordinator has control the banner reads "Coordinator driving" in grey. Clicking "Take Control" blocks the coordinator's send_prompt from reaching that sub-task — the banner turns amber. Clicking "Return to Coordinator" restores it and automatically notifies the coordinator if it was blocked mid-action.

Two things to note as known gaps: first, the button labels ("Take Control" / "Return to Orchestrator") overstate what's happening — the coordinator agent keeps running, only its ability to write to that terminal is paused. "Pause coordinator" / "Resume coordinator" would be more accurate. Second, the sub-task's PromptInput and raw xterm terminal are not currently gated on control state, so a user could type simultaneously with a coordinator send_prompt. The fix is to disable input when the coordinator has control and surface "Pause coordinator" as the way to unlock it — making the model self-documenting rather than relying on reading the banner.

Unit test coverage
379 tests covering: notification staging, ack/dedup, batch formatting, idle detection, signal_done, waiter resolver leak-on-timeout, per-task projectRoot isolation, spawn settings inheritance, control handoff notification, early-close notification.


What isn't done yet

Beta gating — you're right that zero-footprint opt-in is the right approach. We haven't done the experimental.coordinatorMode flag, lazy MCP server import, or IPC registration gating. That's the right next piece and I'm happy to do it.

Settings dialog placement — with coordinator mode, Docker, verbose logging, and more to come, Settings is getting long. A tabbed settings dialog (General / Experimental) would be a natural home for the beta flag, but that's its own PR and should land independently of this one.

Docker + coordinator — currently mutually exclusive. Sub-tasks spawned by a coordinator run as native host processes, defeating Docker isolation. Two approaches documented in KNOWN-TODOS.md: same container via docker exec, or per-sub-task containers. Prerequisite decision: should coordinator mode force direct git isolation (no worktree)? Architecturally that's cleaner since coordinators don't commit code themselves.

Post-restart MCP path integration test — fair ask. The port/token rotation path is the right thing to test. Will add.

skipPermissions guardrail — also fair. Currently if the coordinator is started with skip-permissions, all sub-tasks inherit it. An explicit per-task confirmation in the UI is the right guardrail. One nuance: for the "40 tasks, spawn 2 at a time" workflow, the user probably wants to grant it once rather than 40 times — so I'd suggest a "propagate skip-permissions to sub-tasks" checkbox that requires explicit opt-in, separate from the per-task flow.

Minor items (in KNOWN-TODOS.md): orphaned sub-task badge UI, re-stage notification after user manually sends an edited prompt, configurable notification delay, control handoff input gating and button renaming.


Happy to tackle the beta gating as the next chunk. Does the experimental.coordinatorMode approach you outlined work as the gating mechanism, or do you have a different shape in mind for the flag?

@brooksc brooksc marked this pull request as draft May 7, 2026 05:06
@brooksc brooksc force-pushed the feature/orchestrator-control-v2 branch from 4057d03 to 0be08ca Compare May 7, 2026 05:11
@johannesjo
Copy link
Copy Markdown
Owner

Thanks @brooksc — the depth of the response (and the 379 tests!) tells me where this is heading, and I'm on board with the shape. A few replies and a few new things I noticed reading the diff today.

signal_done over Stop/SubagentStop — agreed, you're right. Your reasoning on Stop firing for mid-task pauses is correct, and the auto-fire-on-quiet staging is a nicer fit than I'd given it credit for. Withdrawing that suggestion.

Beta gating shape: experimental.coordinatorMode works for me. Single boolean, persisted in app state, off by default. Worth doing as its own follow-up PR after the rest lands, or first-in-line — your call. I'd defer the tabbed Settings dialog as you suggested.

skipPermissions guardrail — the propagation checkbox is the right shape. Good refinement: a "propagate skip-permissions to sub-tasks" checkbox in the New Task dialog, defaulting off even when the coordinator itself was launched with skip-permissions. Don't auto-inherit silently. The 40-tasks-at-a-time workflow is real and worth optimizing for.

A few things I caught on a code pass that haven't come up yet:

  1. The post-restart fix described in the PR body isn't actually in the diff. The description says "App.tsx awaits StartMCPServer for each persisted coordinator task." `git grep StartMCPServer` finds it only in `src/store/tasks.ts` (the create-task path). `mcpConfigPath` is also absent from both `saveState` and `loadState` in `src/store/persistence.ts`, so on next launch `TaskAITerminal` skips `--mcp-config` entirely (it's gated on `props.task.mcpConfigPath` at `TaskAITerminal.tsx:223`) and the worktree's `.mcp.json` carries the previous run's token. Could you verify that test-plan item ("Kill and relaunch the app … MCP tools still work") is actually green today? My read is it would `fetch failed` on the first MCP call.

  2. `CLAUDE.md` mutation in the worktree is risky when combined with skip-permissions. `coordinator.ts` writes a `<!-- parallel-code-subtask-start -->` block into the worktree's `CLAUDE.md` and restores it via `setTimeout(..., 3000)` after first idle (`coordinator.ts:317,328`). A skip-permissions sub-agent that decides to `git add -A && git commit` early on will commit our injection. The `git restore` fallback runs with `stdio: 'ignore'` (line 322), so a failure is invisible. Could we use `--append-system-prompt` (if Claude Code supports it) or write to `.claude/settings.local.json` (already gitignored) instead of mutating a tracked file?

  3. `server.listen('0.0.0.0', ...)` + new mutating REST endpoints. Pre-existing for the mobile-remote feature, where token-bearer auth was guarding a read-mostly surface. With this PR, the same token now also gates `POST /api/tasks` (spawn worktree+process), `POST /api/tasks/:id/merge`, `DELETE /api/tasks/:id`, `POST /api/tasks/:id/prompt`. I think we should either (a) bind to 127.0.0.1 by default and only widen when the user explicitly enables remote mobile, or (b) scope the token so coordinator endpoints require an additional capability the mobile-remote token doesn't get. Option (a) is the smaller change.

  4. `waitForIdle` resolves silently on human takeover. `coordinator.ts:495` returns `Promise.resolve()` immediately when `controlMap.get(taskId) === 'human'`. The coordinator can't tell apart "agent went idle" from "human paused agent" — it just sees `{status: "running"}` afterwards and likely loops. Worth changing the resolved value to `{ reason: 'idle' | 'human_control' | 'exited' }` so the agent can branch on it.

  5. Token file permissions. `.mcp.json` (worktree, `register.ts:1075`), `parallel-code-mcp-*.json` (tmp, `register.ts:1067`), and the per-sub-task config (`coordinator.ts:374`) are all written with default umask = `0644`. On a shared machine, other local users can read the token. `{ mode: 0o600 }` everywhere we write a token-bearing config.

  6. Small things: `controlMap` accepts unknown taskIds (no `tasks.has` check in `setTaskControl`); `coordinatorTaskId: 'api'` is a magic-string sentinel that should be `undefined`; `get_task_diff` truncates at 50 KB without reporting the original size; the `gitIsolation` option mentioned in the PR description isn't in the `create_task` JSON schema in `server.ts`.

Item #1 is the only one that affects whether the test plan currently passes; the rest can ride along with the beta-gating PR. Excited about this — the pilot/co-pilot framing has aged well over the discussion.

@brooksc brooksc force-pushed the feature/orchestrator-control-v2 branch from 0be08ca to ad6746d Compare May 9, 2026 07:56
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 9, 2026

Update — coordinator mode: major second pass, largely self-developed

We've done a substantial second pass on this feature. The headline: most of the work in this update was done by the coordinator itself — we used the tool to fix the tool.

Video demo (recorded mid-session, before max-concurrent UI was added — shows the core flow live)

Design document: design.md — architecture overview, MCP server design, and data flow.


The coordinator ran this PR

To validate the implementation, we gave the coordinator a single prompt:

"Work through TODOS.md in your worktree. Use feature/orchestrator-control-v2 as the baseBranch for all sub-tasks."

From that one prompt, the coordinator:

  • Read TODOS.md and decomposed 13 bugs autonomously — no further input from us
  • Named each sub-agent task descriptively (fix-r1-task-closed-neighbor, fix-r2-persist-signal-done-review, fix-9-backend-task-registry-hydration, etc.)
  • Spawned up to 3 sub-agents in parallel, each in an isolated git worktree on its own branch
  • Managed the sliding-window loop: as each sub-agent called signal_done, the coordinator dispatched a landing agent to diff → merge → close while immediately spawning the next task
  • Sequenced tasks with file dependencies correctly (e.g., backend hydration before controlMap restore) without being told to
  • Merged all 13 branches into feature/orchestrator-control-v2 via squash commits
  • Total wall-clock time: ~54 minutes for 13 bugs

The only human involvement was watching the coordinator work and confirming the final result — no manual coding.


What changed since the initial revision

Control model renamed and extended

  • "Pause/Resume coordinator" → Take Control / Release Control — clearer ownership language
  • The coordinator task itself now shows the same control bar, with autofire skipping ticks while the user has control (prevents spurious 10-miss escalation when you're mid-thought)
  • Discoverability tooltip: clicking inside a coordinator panel while it's in auto mode shows a hint pointing at Take Control (shown max 3 times, dismissable with "Don't show again", count persisted)

Max concurrent tasks UI
Numeric input in the coordinator dialog (1–10, default 3). Injected into the preamble at task creation time. The selected baseBranch is also auto-injected so users don't need to repeat it in the prompt.

Coordinator preamble rewritten from scratch
Iterated through live sessions — including asking the coordinator directly what was confusing. Key rules now encoded:

  • Sliding-window pattern (spawn N, replace immediately on completion — never batch-drain)
  • Background Agent landing (dispatch native Agent for diff→merge→close, don't do it inline)
  • baseBranch must be the project's durable branch, not the coordinator's ephemeral task branch
  • Commit before merge_task — dirty tree fails the merge
  • Scope discipline — only read sources explicitly specified in the prompt
  • Sub-agents run tests/typecheck before calling signal_done

13 bugs fixed (all found during live coordinator sessions, most fixed by sub-agents):

Item Fix
R1 MCP_TaskClosed neighbor selection (idx captured before cleanup removes taskId)
R2 signalDoneReceived/needsReview now persisted across restarts
R3 createTask failure cleanup — no more zombie PTYs/tasks in memory
R4 .claude/settings.local.json preamble stripped before merge
item 9 Backend Coordinator.tasks hydrated on app restart
item 10 Backend controlMap restored on restart (human control respected after relaunch)
item 11 MCP_CoordinatorOrphanedNotification channel split; ack path now marks needsReview
item 12 review_and_merge_task deprecated in preamble; preamble guides get_task_diff → merge_task → close_task
item 13 gitIsolation now rejected at REST (400) instead of silently ignored
item 14 SubTaskStrip: clicking collapsed sub-task now uncollapses it
item 15 Preamble strip preserves intentionally empty tracked instruction files
item 18 Sidebar drag uses visible-order translation to keep coordinator+child blocks contiguous
item 19 Clicking a collapsed coordinator child also uncollapses the coordinator

Tests: 464 passing across 33 test files — this PR adds ~2,057 lines of new test code covering coordinator notification logic, control handoff, PTY idle detection, and git operations.


What's still open

Two known edge cases, intentionally deferred:

  • item 7 — Autofire escalates unnecessarily if the coordinator is mid-tool-call during a countdown. Rare timing window; no clean fix yet.
  • item 8 — If the coordinator process restarts (not the Electron app), MCP URL/token changes and running sub-tasks lose connection. App restart is handled; process restart is not.

Not yet tested: Docker integration — the coordinator skipPermissions propagation and Docker-mode sub-agent spawning have not been exercised end-to-end.


Test plan

  • Create coordinator task — control bar appears with "Take Control"
  • Sub-task panels show auto mode / "You have control" banner
  • Take Control on coordinator → autofire stops counting misses
  • Release Control → autofire resumes
  • Click in coordinator panel while in auto mode → tooltip appears (max 3×, "Don't show again" persists)
  • Kill and relaunch app → coordinator MCP tools still work, human-control sub-tasks stay locked
  • npm run check && npm test passes (464 tests)

🤖 Generated with Claude Code

@johannesjo
Copy link
Copy Markdown
Owner

I rechecked the current head (ad6746d). I’d still hold this before merging, even behind beta, because a few coordinator/MCP lifecycle boundaries are not doing what the PR/test plan says yet.

  1. Sub-task MCP is not limited to signal_done. electron/mcp/server.ts:48-209 exposes one unconditional tool list, including create_task, send_prompt, merge_task, and close_task. Sub-task configs are created with --task-id only (electron/mcp/coordinator.ts:470-476), but execution only role-checks wait_for_signal_done and signal_done (electron/mcp/server.ts:306-365). create_task remains callable at server.ts:219-227. This means a sub-task can still perform coordinator-level lifecycle operations, including spawning more tasks.

  2. The restart path still does not make coordinated tasks reliably usable. There is now startup restore code in src/App.tsx:328-370, but it does not await StartMCPServer, only iterates store.taskOrder, and hydrates children without their restored renderer agent IDs. The IPC handler then invents a fresh agentId (electron/ipc/register.ts:1221-1255), and Coordinator.hydrateTask() stores that random ID with status exited (electron/mcp/coordinator.ts:869-895). After relaunch, MCP calls such as send_prompt/wait_for_idle are not connected to the actual respawned PTY session, so the “kill and relaunch app → coordinator MCP tools still work” test plan item is still not guaranteed.

  3. wait_for_idle still mishandles human takeover, and the MCP result hides the useful reason. waitForIdle() returns { reason: 'human_control' } only if the task is already human-controlled at call time (electron/mcp/coordinator.ts:652-656). If a waiter is already pending, setTaskControl(..., 'human') does not resolve it; resolvers are fired only when control returns to 'coordinator' (coordinator.ts:113-125). Even when the coordinator returns a reason, the HTTP route discards it and replies only with { status } (electron/remote/server.ts:344-348), and the MCP client type also exposes only { status: string } (electron/mcp/client.ts:65-70).

  4. Failed coordinated task creation can leak worktrees and backend state. createTask() creates the backend worktree (electron/mcp/coordinator.ts:323-329) and inserts task/buffer/decoder state (:351-357) before several failure points. The guarded catch later only restores/deletes the injected preamble file (:578-590); it does not call deleteTask, remove this.tasks, clear buffers/decoders, unsubscribe, or delete MCP config. A bad agent command or spawn failure is enough to leave stale coordinator state plus a worktree/branch behind.

  5. signalDoneReceived and needsReview are not persisted. These runtime fields exist (src/store/types.ts:90-91) and are set from MCP events (src/store/tasks.ts:1039-1045), but saveState() omits them for active/collapsed tasks (src/store/persistence.ts:88-114, :123-150), PersistedTask does not include them, and load reconstruction does not restore them (persistence.ts:483-520, :562-602). A relaunch loses the done/review-needed UI state.

@johannesjo
Copy link
Copy Markdown
Owner

I rechecked current head c2727a0 with an additional pass. I'd still hold this before beta because a few lifecycle boundaries are still not reliable:

  1. Two new MCP IPC channels are blocked by preload. MCP_CoordinatorNotificationDropAck and MCP_HydrateCoordinatedTask exist in electron/ipc/channels.ts:156-159 and are used from src/components/PromptInput.tsx:452 / src/App.tsx:358, but electron/preload.cjs:124-143 does not allowlist them. That means autofire drop ack and restart hydration fail before reaching main.

  2. Coordinators are not scoped to their own project/tasks. create_task sends no projectId from the MCP server (electron/mcp/server.ts:75-80), while StartMCPServer overwrites a global default project. Also list_tasks, send_prompt, merge_task, and close_task accept only taskId without checking caller coordinatorTaskId (electron/remote/server.ts:300-325, :420-450). Multiple coordinators can see/control each other's children.

  3. Coordinator MCP is broken if remote access was started first. startRemoteServer() captures opts.coordinator when constructed, but StartMCPServer reuses an existing remoteServer (electron/ipc/register.ts:1027). If the remote server was already running with no coordinator, /api/tasks coordinator routes are absent.

  4. Restart restore is still racy/incomplete. StartMCPServer is fired without awaiting despite the comment saying config is rewritten before agents resume (src/App.tsx:329-349), and hydrateTask() does not restore signalDoneAt/consumed state (electron/mcp/coordinator.ts:902-914). Already-signaled children can be lost to wait_for_signal_done after restart.

  5. Closing a coordinated child from the UI leaves stale backend coordinator state. closeTask() only deregisters when the closed task is the coordinator (src/store/tasks.ts:349-368). Closing a child deletes the renderer task/worktree, but backend Coordinator.tasks can still list/merge/close that stale child.

  6. Docker coordinator paths do not line up. Docker coordinator mode writes mcp-server.cjs / .mcp.json under projectRoot (electron/ipc/register.ts:1059-1119), but the container only mounts the coordinator task cwd (electron/ipc/pty.ts:257-261). For worktree-isolated coordinator tasks, the container cannot see those files. Docker sub-task respawn args also omit -w <subtask worktree> (electron/mcp/coordinator.ts:567-569), unlike the initial backend docker exec.

These are mostly boundary/restore/scoping issues, but they affect the exact scenarios the beta gate is meant to make safe.

@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 10, 2026

Pushed a significant round of fixes and improvements since the initial draft. Summary below.


Non-Docker coordinator — working well end-to-end

The core coordinator flow is solid in non-Docker mode and has been tested through several full runs. Key improvements in this update:

Take Control / Release Control
The coordinator task itself now has the same control bar as sub-tasks. Taking control pauses autofire so you can type freely; releasing it resumes immediately — the pending prompt fires on the next tick rather than waiting out the countdown. In coordinator-controlled mode the autofire delay is skipped entirely (the delay exists for humans to cancel; in auto mode it's just latency).

Autofire robustness
Three fixes to the miss/escalation path: coordinator mode skips the human-facing delay, the miss counter resets whenever the agent produces new output (so a long tool call no longer accumulates misses and triggers premature escalation), and releasing control triggers an immediate fire attempt rather than waiting up to 1s for the next interval tick.

Coordinator preamble
Sliding-window dispatch pattern, correct baseBranch rule (sub-tasks now branch from the coordinator's own commit — branching from the shared base caused sub-task diffs to include the entire coordinator feature), and landing agents must report back before the coordinator declares done.

Correctness / data integrity

  • initialPrompt persisted across app restarts — sub-tasks no longer start idle after a crash
  • hydrateTask restores signalDoneAt on restart — wait_for_signal_done no longer blocks forever after restart
  • StartMCPServer awaited before child hydration — agents no longer respawn before MCP is ready
  • MCP tool handlers scoped to coordinator ID — concurrent coordinators can no longer see or control each other's sub-tasks
  • setTaskControl awaits IPC before updating UI — optimistic state can no longer get out of sync with backend
  • Coordinator deregistration cleans all child backend resources — no dangling callbacks after teardown
  • Closing a coordinated child from the UI notifies the backend — stale Coordinator.tasks entries eliminated

Reliability

  • Sidebar drag index correct with coordinator children present
  • Coordinator tasks and controlMap hydrated from backend on app restart
  • Collapsing coordinator tasks blocked (preserves agent identity)
  • Collapsed children still visible in the coordinator sub-task strip
  • Coordinator children closed before coordinator on project removal
  • Preamble files restored on createTask failure, stripped before auto-commit in mergeTask
  • Stale Docker MCP URL detected and warned on startup
  • startRemoteServer awaits listen — fast clients can no longer connect before the port is bound
  • get_task_diff and review_and_merge_task surface uncommitted file warnings to the coordinator
  • Panel crash on coordinator close fixed (getWorktreeStatus guarded against deleted worktree path)

Security

  • MCP token passed via env var — no longer visible in ps aux

UX

  • Max concurrent sub-tasks input in the New Task dialog
  • needsReview flag reflected in task dot status

Docker coordinator — functional, one open design question

Docker mode works but there's an architectural question I'd like your input on before closing it out.

The bind address problem: The coordinator's MCP HTTP server needs to be reachable from inside the Docker container via host.docker.internal. On macOS Docker Desktop that hostname resolves to the host's gateway IP — not loopback — so 127.0.0.1:7777 is unreachable from the container. The current fix binds to 0.0.0.0.

This is the same concern you raised on the mobile-remote surface, except now the exposed endpoints are fully mutating: POST /api/tasks (spawns a worktree + process), POST /api/tasks/:id/merge, DELETE /api/tasks/:id, POST /api/tasks/:id/prompt. The token protects it, but anyone on the LAN can reach these endpoints whenever coordinator mode is active.

Options on the table:

  1. Keep 0.0.0.0, token protection only — simplest, current state
  2. Split into two servers: a minimal coordinator-only server on 0.0.0.0, mobile-remote stays independent and separate
  3. Unix socket bind-mounted into the container — eliminates TCP exposure entirely, larger change

Option (a) from your earlier suggestion — bind to 127.0.0.1 by default, widen only for explicit remote-mobile — breaks Docker coordinator since host.docker.internal can't reach loopback.

Happy to implement whichever direction you prefer. The rest of the Docker fixes (.git dir mounted so git works inside the container, isolated HOME dirs per sub-task, inner process cleanup on sub-task close, macOS TMPDIR vars blocked from container env) are already landed.

Also noting a future improvement worth discussing: switching from docker exec (all sub-tasks share one container) to docker run per sub-task. This eliminates HOME collisions by design and makes process cleanup a clean docker stop. Not blocked on the bind address question but related.

@brooksc brooksc marked this pull request as ready for review May 10, 2026 07:33
@johannesjo
Copy link
Copy Markdown
Owner

I rechecked current head b55dcc9 with an independent sub-agent pass. I'd still hold this before beta; a few lifecycle/security edges are still concrete:

  1. merge_task now fails for any sub-task that creates new files. mergeTask() strips preamble files and then runs git add -u before the auto-commit (electron/mcp/coordinator.ts:851-855). git add -u does not stage untracked files, so a normal task that adds a new source/test file leaves git status --porcelain non-empty and the merge path throws Auto-commit failed... (:859-866). This breaks the expected coordinator flow for a very common class of sub-task.

  2. Coordinator startup overwrites user-owned .mcp.json. StartMCPServer writes a complete config to selectMcpJsonDir(...)/.mcp.json unconditionally (electron/ipc/register.ts:1353-1356). If the project/worktree already has an MCP config, this destroys it instead of merging/restoring. Also, passing { mode: 0o600 } to writeFileSync does not reliably tighten permissions on an existing file.

  3. skipPermissions is still controlled by the coordinator tool call, not only by explicit UI consent. The MCP schema still exposes create_task.skipPermissions (electron/mcp/mcp-tool-list.ts:31-35), the MCP server forwards it (electron/mcp/server.ts:75-80), the REST handler accepts it (electron/remote/server.ts:276-297), and Coordinator.createTask() turns it into --dangerously-skip-permissions (electron/mcp/coordinator.ts:552-556). The "propagate skip-permissions" checkbox only controls the default; a coordinator can still request it per task.

  4. The 0.0.0.0 coordinator/remote server stays up after coordinator close. Coordinator startup binds the shared server to all interfaces (electron/ipc/register.ts:1234-1239). Closing/deregistering the coordinator only calls coordinator?.deregisterCoordinator(...) (:1082-1084); StopMCPServer is a no-op (:1412-1414), and nothing stops remoteServer when the last coordinator deregisters. That leaves the LAN-reachable mutating API running after coordinator mode is no longer active.

  5. Docker sub-task restart still points at the old coordinator container. Docker child tasks persist agentCommand = docker plus agentArgs = exec ... <containerName> ... (electron/mcp/coordinator.ts:617-625), and TaskAITerminal reuses those persisted args on restart (src/components/TaskAITerminal.tsx:215-225). After an app restart, App.tsx reconstructs a new coordinator container name from the newly restored coordinator agent id (src/App.tsx:341-347), so restored Docker child tasks can restart against a container name that no longer exists.

  6. Restart restore still does not actually guarantee MCP is ready before agents resume. loadState() restores tasks and agents first, which can mount TerminalView and call SpawnAgent; only after that does App.tsx call StartMCPServer and hydrate children (src/App.tsx:326-390, src/components/TerminalView.tsx:606-621). That still leaves a race where restored coordinators/children can boot with stale MCP config before the new port/token rewrite lands.

These are all fixable, but they hit the core beta promises: no config loss, safe skip-permission gating, bounded network surface, Docker restart correctness, and reliable restart recovery.

@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 11, 2026

Update: Security hardening, refactoring, and static analysis tooling

This update adds to the original orchestrator control work across three areas.


Security fixes

P1 — Two-class token model (electron/remote/server.ts)

  • Sub-tasks now receive a separate subtaskToken (restricted to POST /api/tasks/{id}/done) rather than the full coordinator token. Previously a sub-task could use its token to call any REST route (/api/tasks, /merge, /close, /prompt) or authenticate a WebSocket to control another agent's terminal.
  • classifyToken() replaces the old checkAuth(). Coordinator token: full access. Subtask token: only done. WebSocket rejects subtask tokens at the upgrade step.

P1 — Server bind address (electron/remote/server.ts)

  • Server now binds to 127.0.0.1 instead of 0.0.0.0. On most macOS/Linux setups the app was reachable from LAN without authentication.

P1 — Route scoping and validator hardening (electron/remote/server.ts)

  • Added ownedByCallerOrUnscoped check so coordinator IDs can only reach their own tasks via REST/WS.
  • Input validators are now fail-closed: unknown task IDs return 404 rather than silently no-op.

P2 — Replay cache scoping (electron/mcp/replay-cache.ts)

  • Deduplicated signal-done responses were keyed only by requestId, so two different coordinators with the same request ID could share a cached result. Key is now coordinatorTaskId:requestId.

P2 — redactServerUrl in logs (electron/remote/server.ts)

  • Bearer tokens were previously logged in plain text on startup/restart. Semgrep rule console-log-token-variable was added to catch future regressions.

P3 — Docker-mode ordering (electron/ipc/register.ts)

  • fs.copyFileSync and .git/info/exclude writes happened before mcpConfig/mergedMcpJson were computed. A malformed .mcp.json would throw after copying the binary, leaving filesystem residue. All pure computation now precedes any side effects.

Refactoring

Atomic file writes (electron/mcp/atomic.ts)

  • All MCP config writes now go through atomicWriteFileSync / atomicWriteFile (write-to-tmp + rename). A crash mid-write can no longer produce a partially-written config file that silently corrupts a sub-task's MCP environment.

Replay cache extracted (electron/mcp/replay-cache.ts)

  • ReplayCache<T> is a standalone generic class with TTL eviction. The Coordinator class no longer owns inline Map + expiry logic.

Preamble module extracted (electron/mcp/preamble.ts)

  • removePreambleBlock, detectPreambleFiles, filterDiffSections, buildNormalizedPreambleFileDiff, stripPreambleFromBranch are now standalone exports. The Coordinator class is ~200 lines shorter.

Coordinator lifecycle state machine (electron/mcp/types.ts, coordinator.ts)

  • CoordinatorLifecycle: 'starting' | 'ready' | 'closing' | 'closed' makes coordinator state transitions explicit and auditable rather than inferred from presence/absence of fields.

Testing

  • New tests covering all P1/P2 security fixes: subtask token is rejected on non-done routes, coordinator token passes, classifyToken handles query-param and Bearer header forms, replay cache cross-coordinator isolation, ownedByCallerOrUnscoped boundary cases.
  • Atomic write mock strategy: vi.mock('./atomic.js', ...) intercepts the full module so tests assert on the final target path/content without tracking temp-file internals.
  • removePreambleBlock and other preamble helpers are now directly importable in tests without instantiating a Coordinator.
  • Test count: 790 passing, 0 failing.

Static analysis tooling

Added four tools, all running locally (no GitHub Actions required):

Tool Script Purpose
Knip npm run lint:dead Dead code detection
dependency-cruiser npm run lint:arch Architecture boundary enforcement
Semgrep npm run lint:security Custom security rules (9 rules)
Gitleaks npm run lint:secrets Secret / token scanning

Current findings:

  • Knip: 2 dead files (MonacoDiffEditor.tsx, diff-parser.ts), 18 unused exports, 12 unused types — not removed in this PR, reported for follow-up.
  • dependency-cruiser: 7 pre-existing circular dependencies in the store layer; no boundary violations (no src/ importing electron/ main-process code).
  • Semgrep: 0 violations (1 nosemgrep suppression on the Mermaid SVG renderer, which produces its own sanitized output).
  • Gitleaks: clean.

npm run check:static runs typecheck + lint + lint:dead + lint:arch together.


Outstanding: Docker networking design

The P3 fix (ordering) is merged, but the underlying design has a deeper issue that isn't addressed here: the current model assumes one coordinator → one Docker container. When a coordinator spawns multiple sub-tasks, each needs its own container, but they all currently share a single dockerContainerName. The authentication model for container-to-host networking (token passing via environment into an isolated container network) also needs a dedicated design pass. This is tracked as a known issue and is out of scope for this PR.


PS: I'm traveling this week, so if you find any issues I'll resolve them later in the week.

@johannesjo
Copy link
Copy Markdown
Owner

johannesjo commented May 11, 2026

Thank you very much for the update! <3

Reviewed the update range b55dcc9..c80a029 with multiple focused passes. I think these are worth addressing before merge:

  • P1: coordinator token scoping is bypassable. In electron/remote/server.ts:361-378 / 397-529, task-route access falls back to unscoped behavior when X-Coordinator-Id is missing or unregistered. Since all coordinator MCP configs use the same remoteServer.token (electron/ipc/register.ts:1394), a coordinator process that has that token can omit the header and list/control tasks owned by other coordinators. Coordinator-token task routes should require a valid coordinator identity, or use per-coordinator tokens.

  • P1: sub-task signal_done auth is not task-scoped. electron/remote/server.ts:192-193 allows the global subtaskToken to call any /api/tasks/:id/done, and electron/remote/server.ts:521-528 treats a missing coordinator header as unscoped. Any sub-task with access to its MCP config/token can signal completion for another task. This should be bound to the specific task id, ideally with per-task done tokens.

  • P1: atomic writes can fail across filesystems. electron/mcp/atomic.ts:14-17 and 34-37 create temp files in os.tmpdir() and then rename them to the target. That throws EXDEV when /tmp and the worktree/config path are on different mounts, which is common with tmpfs, external project volumes, or Docker/bind-mounted paths. This affects .mcp.json writes at electron/ipc/register.ts:1479 and Docker sub-task config writes at electron/mcp/coordinator.ts:596-597. The temp file needs to live in dirname(filePath).

  • P1: retry after MCP startup error does not spawn the terminal. src/components/TerminalView.tsx:641-653 only installs the ready-state watcher when the initial status is pending; if the component mounts in error, it skips both the watcher and spawnNow(). Clicking Retry at src/components/TerminalView.tsx:743 can move the store state to pending/ready, but the mount block does not re-run, so the overlay disappears and no PTY starts. Install the watcher regardless of initial status, guarded by a spawned flag.

  • P1: child hydration can mark children ready even when coordinator restore failed. src/App.tsx:374-400 uses Promise.allSettled and then hydrates every coordinated child regardless of the coordinator's MCP restore result. The IPC handler at electron/ipc/register.ts:1223-1241 also uses coordinator?.hydrateTask(...) and still emits MCP_TaskHydrated, so children can spawn with stale/broken MCP wiring when the coordinator is missing or in error. Gate hydration on coordinator ready, and make the IPC handler fail if the coordinator is unavailable/unregistered.

  • P2: custom Docker images are not propagated for new coordinator-created sub-tasks. Fresh coordinator creation and retry call IPC.StartMCPServer without dockerImage at src/store/tasks.ts:186-195 and 1176-1185. register.ts:1417 stores only args.dockerImage, and coordinator.ts:628 passes that to docker run, so MCP-created sub-tasks fall back to the default image. The restore path does pass it (src/App.tsx:363), but fresh creation/retry should too.

  • P2: enabling remote access after local MCP can return unreachable network URLs. If MCP starts first without Docker, StartMCPServer binds the shared server to 127.0.0.1 (electron/ipc/register.ts:1342-1343). Later StartRemoteServer just returns the existing server (electron/ipc/register.ts:1018-1027) while exposing WiFi/Tailscale URLs from electron/remote/server.ts:852+; those URLs cannot connect to a loopback-only listener. Manual remote start needs to rebind/restart or otherwise avoid advertising unreachable URLs.

  • P2: existing .mcp.json parallel-code entries are overwritten and deleted. The merge at electron/ipc/register.ts:1405-1408 preserves other server names, but overwrites any pre-existing mcpServers["parallel-code"]. Deregistration then deletes that key at electron/mcp/coordinator.ts:1258-1266. This can remove user-owned or concurrent config. Preserve/restore the previous value, or delete only when the current value still matches the entry this coordinator wrote.

Smaller follow-up: package.json:24-25 adds semgrep/gitleaks scripts, but those CLIs are not in devDependencies; npm ci alone will not make the new lint scripts reproducible unless they are installed another way.

brooksc and others added 14 commits May 14, 2026 11:21
Adds HTTP REST endpoints for task management (create, list, get, prompt,
wait, diff, output, signal-done, merge, close) plus an MCP log ring buffer.
Also adds AGENTS.md (project context for AI agents) and rebuild-integration.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… diff truncation metadata, gitIsolation schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wait_for_signal_done redesign: removed taskId parameter; returns
  whichever sub-task completes next with a `remaining` count so the
  coordinator can loop naturally without guessing order
- merge_task / review_and_merge_task: operate in the sub-task's worktree
  so the target branch doesn't need to be checked out in the main repo
- Docker: sub-agents spawned via docker exec into the coordinator's
  container when coordinator runs with Docker mode enabled
- Preamble injection: writes signal_done instructions to the right file
  per agent type (AGENTS.md / GEMINI.md / .agent.md /
  settings.local.json) in the worktree rather than the project root;
  file is cleaned up by close_task
- Notification system: suppress pending notifications during signal waits
  to prevent double-notify; escalate to orphaned notification after 10
  missed autofire attempts; clear stale notifications on coordinator
  deregistration
- get_task_output: append truncation sentinel when scrollback exceeds
  20 000 chars so the coordinator knows it saw partial output
- wait_for_idle: return reason field (idle / exited / human_control)
- Autofire: countdown UI; don't treat previous staged text as user edit
- Persistence: persist mcpConfigPath across app restarts; rewrite config
  on startup with current port/token
- Tests: 63 unit tests in coordinator.test.ts; git mergeWorktreePath test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coordinators can drive sub-tasks via send_prompt; users can reclaim a
task at any time with Take Control and hand it back with Release Control.
This PR extends that model to the coordinator task itself.

Control bar (TaskPanel)
- Renamed "Pause/Resume coordinator" → "Take Control / Release Control"
  for clearer ownership language
- Extended to coordinator tasks (coordinatorMode) in addition to sub-tasks
  (coordinatedBy): the same bar now appears on the coordinator's own panel
- Auto-mode label: "Auto mode" for coordinator tasks, "Coordinator driving"
  for sub-tasks

Input locking (PromptInput + TerminalView)
- PromptInput textarea is disabled when controlledBy === 'coordinator';
  onKeyDown/onInput guards provide belt-and-suspenders protection
- TerminalView: term.options.disableStdin toggled reactively so xterm
  stops forwarding keystrokes to the PTY (the real input path users take)
- Autofire interval skips ticks (no miss count) when controlledBy is human

Coordinator task panel UX
- Invisible pointer-events overlay gives coordinator task the same
  cursor:not-allowed treatment sub-tasks already had
- Overlay click propagates to maybeShowControlHint, enabling the
  discoverability tooltip (shown ≤3 times, dismissible, "Don't show again")
- PromptInput panel hidden (display:none, minSize getter → 0) in auto mode
  while keeping the component mounted so autofire keeps running

Control state persistence
- controlledBy persisted in PersistedTask; restored on app restart
- Default: 'coordinator' for coordinator/coordinated tasks, undefined otherwise
- setTaskControl skips the MCP_ControlChanged IPC for coordinator tasks
  (backend only knows sub-tasks, not the coordinator task itself)

Bug fixes
- MCP_TaskCreated handler now sets controlledBy:'coordinator' on new
  sub-tasks (was missing, leaving textarea unlocked)
- Closing a coordinator task clears controlledBy on all children so
  their textareas re-enable when the control bar disappears
- setTaskControl calls saveState() so Take Control survives autosave

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- design.md: comprehensive write-up of the coordinator mode feature —
  architecture diagram, Coordinator class internals, MCP tool surface,
  wait_for_signal_done redesign rationale, notification system, control
  handoff, autofire, discoverability hint, settings, security, Docker,
  New Task Dialog changes, and test coverage summary
- TODOS.md (renamed from KNOWN-TODOS.md): easy coordinator test tasks,
  medium known issues (get_task_output truncation, merge_task live run,
  autofire timing edge case, MCP config staleness), hard backend
  hydration issues, and frontend test requirements (items 12–16)
- Removed AGENTS.md from project root (preamble is now written per-agent
  into the worktree by the coordinator, not the project root)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that newly created sub-tasks have controlledBy set to
'coordinator' and coordinatedBy set to the coordinator task ID,
with a regression guard ensuring controlledBy is always defined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l tick

Extracts the autofire tick decision logic into processAutoFireTick (autofire-tick.ts)
so it can be tested as a pure function without mounting the SolidJS component.

Tests verify:
- controlledBy==='human' returns 'paused' without touching the miss counter
- controlledBy==='coordinator' increments miss count and fires when prompt visible

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts disableStdin logic into computeDisableStdin() and adds
tests for coordinator-controlled (true), human-controlled (false),
and undefined controlledBy (false) scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge_task tool merges into baseBranch (the coordinator's feature
branch), not necessarily main. Update both the MCP tool description and
coordinator preamble to reflect the correct behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The coordinator preamble was being injected twice — once in NewTaskDialog.tsx
(from src/lib/coordinator-preamble.ts) and again in src/store/tasks.ts
(from src/store/coordinator-preamble.ts), causing coordinators to start with
duplicated and contradictory instructions.

Remove src/lib/coordinator-preamble.ts and its prepend in NewTaskDialog.tsx,
consolidating all rules into the store version as single source of truth.
The store version now includes the BAD/GOOD task-assignment guidance, baseBranch
reminder, max-3-concurrent rule, file-overlap guidance, and the verify-before-assigning
rule from the deleted lib version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_merge_task

Both handlers now append ' [NOT COMMITTED — will be auto-committed on merge]'
to any file entry where committed === false, matching the safety context the
file-object type already carries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
brooksc and others added 18 commits May 14, 2026 11:34
Extract validateStartMCPServerArgs from the handler into a testable
exported function, then add Layer 4 tests covering all five rejection
paths (non-absolute projectRoot/worktreePath, ".." traversal, non-string
agentArgs elements, shell-special dockerContainerName). Each rejection
test also spies on fs.writeFileSync/copyFileSync to confirm no I/O
occurs before the error is thrown. Two positive tests verify optional
fields (worktreePath, dockerContainerName) can be omitted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…johannesjo#39

- johannesjo#33: restart round-trip — hydrateTask rewrites config with new subtaskToken
  (not coordinator token) when setMCPServerInfo has already run; and
  waitForIdle resolves after agent output fires post-hydration
- johannesjo#36: mcpConfigPath directory scoping — path traversal rejected, wrong-dir
  rejected, correct host tmpdir accepted, Docker dirname(serverPath) accepted,
  Docker wrong-dir rejected
- johannesjo#39: per-task close isolation — closing task-1 deletes only its config file;
  task-2 config and task entry remain untouched

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t after preamble block

wait_for_signal_done (electron/mcp/client.ts):
- On network-level TypeError (fetch failed, ECONNRESET), retry with exponential backoff
  up to 10 attempts (max 30 s per retry). HTTP errors (4xx/5xx) propagate immediately.
- Since signal_done writes durable server-side state, a retry returns the already-signaled
  result instantly — coordinator no longer stalls when a transient connection drops.

Preamble stripping (electron/mcp/coordinator.ts):
- Add removePreambleBlock(): removes only the <sub-task-mode>...</sub-task-mode> block
  and its surrounding blank-line separators; preserves all content before and after.
- stripPreambleFromBranch (merge path): use removePreambleBlock instead of slicing to EOF,
  fixing data loss when a sub-task legitimately edits a preamble-bearing file after the
  injected block.
- getTaskDiff (review path): replace whole-section exclusion with buildNormalizedPreambleFileDiff,
  which generates a diff between the base version and the preamble-stripped worktree version.
  Files with only preamble changes produce an empty normalized diff and are still excluded;
  files with real changes beyond the preamble now appear in the coordinator's review diff.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… base

wait_for_signal_done retry safety (client.ts, coordinator.ts, server.ts):
- client.ts generates a stable requestId (UUID) per call and includes it in every
  retry attempt, so the server can replay the result instead of blocking again
- coordinator.waitForSignalDone() accepts optional requestId; on first delivery it
  stores the result in recentlyDelivered (120 s TTL) keyed by requestId; on retry
  with the same id it returns the cached result immediately, avoiding a wait on an
  already-consumed signal
- cacheDeliveredResult() evicts expired entries on each write to prevent unbounded growth
- remote/server.ts extracts requestId from the request body and forwards it

Preamble diff base (git.ts, coordinator.ts):
- Export getDiffBaseSha() from git.ts — same merge-base + detectDiffBase logic used
  by getAllFileDiffs(), so all three callers (files, raw diff, normalized preamble diff)
  use identical base-selection semantics including fallback to detected main
- getTaskDiff() computes baseSha once and passes it to buildNormalizedPreambleFileDiff()
- buildNormalizedPreambleFileDiff() now accepts baseSha directly, eliminating the
  incorrect 'git merge-base HEAD HEAD' fallback when baseBranch is undefined

Tests (coordinator.test.ts):
- waitForSignalDone requestId replay: same id returns cached result; different id times out
- removePreambleBlock unit tests: content after end marker preserved, before preserved,
  pure-preamble empty, malformed falls back, no-marker unchanged (6 cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests for johannesjo#34/johannesjo#37/johannesjo#40

- Extract validateBranchName into electron/mcp/validation.ts; apply at MCP and REST boundaries (johannesjo#38)
- Fail-closed on malformed .mcp.json in register.ts: parse/validate before writing temp config (johannesjo#42)
- Show staged coordinator prompt text and countdown in read-only TaskPanel without "Take Control" (johannesjo#44)
- Add getTaskDiff preamble-bearing file normalization tests (AGENTS.md + settings.local.json) (johannesjo#34)
- Add closeTask IPC ordering tests: MCP_CoordinatedTaskClosed and MCP_CoordinatorDeregistered (johannesjo#37)
- Add .mcp.json merge/cleanup tests via deregisterCoordinator (johannesjo#40)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace local validateBranchName in register.ts and coordinator.ts with the shared
  validator from validation.ts — control-character and whitespace checks now apply on
  all entry paths (IPC, MCP, REST, coordinator-internal)
- Move .mcp.json read/parse to before registerCoordinator/setMCPServerInfo so a
  malformed file fails fast without leaving partial coordinator state
- Key waitForSignalDone replay cache by coordinatorTaskId:requestId to prevent
  cross-coordinator result leakage; add regression test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ferral)

- removePreambleBlock: return content unchanged when end marker is missing instead
  of stripping to EOF — eliminates data loss risk on malformed injected blocks
- persistence.ts: set mcpStartupStatus 'pending' for coordinator and coordinated
  tasks on restore so TerminalView defers SpawnAgent until StartMCPServer/hydrateTask
  completes and rewrites config with the fresh session token
- Update test: malformed-block expectation now asserts content preservation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…it cache warm, temp file cleanup

- detectPreambleFiles: converted to async (fs/promises) to avoid blocking main thread
- stripPreambleFromBranch: converted to async (fs/promises) for same reason
- getTaskDiff: serialize baseSha before parallel getChangedFiles/getAllFileDiffs
  so both hit the warmed cache on the second call
- MCP_CoordinatorDeregistered handler: delete temp sub-task config file on deregistration
- persistence.ts: set mcpStartupStatus: 'pending' for coordinator/sub-task tasks on restore
- coordinator.test.ts: updated preamble-bearing tests to mock fsReadFile (async) for
  detectPreambleFiles while keeping existsSync/readFileSync for buildNormalizedPreambleFileDiff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardening

P1-1  StartMCPServer binds 127.0.0.1 by default; 0.0.0.0 only in Docker mode.
      StartRemoteServer keeps 0.0.0.0 (explicit user action for WiFi/mobile).

P1-2  Coordinator token never returned to renderer. Removed 'token' field from
      StartRemoteServer, GetRemoteStatus, and StartMCPServer IPC returns.
      GetMCPStatus.serverUrl now uses getMCPRemoteServerUrl (no token).
      RemoteAccess store type and initial state updated accordingly.

P1-3  Third mobileToken generated at startup. wifiUrl/tailscaleUrl/url all
      embed mobileToken instead of coordinator token. Mobile callers are
      classified as 'mobile' and restricted to GET /api/agents, GET /api/tasks
      (read-only). Only coordinator token grants WebSocket + write routes.

P2-1  redactServerUrl() exported from server.ts; used in register.ts console.warn
      to strip token query param before logging.

P2-2  validateBranchName rejects path traversal (/../) and shell metacharacters
      (`$(){}[]<>\\'*?!#;|&") in addition to existing control-char checks.

P2-3  MCP_HydrateCoordinatedTask now calls validateBranchName on branchName and
      baseBranch (if present) — matches validation enforced by createTask.

P2-4  PARALLEL_CODE_MCP_TOKEN added to SpawnAgent ENV_BLOCK_LIST so a crafted
      IPC call cannot override the MCP auth token in spawned agents.

P2-5  X-Coordinator-Id header is now verified against registered coordinators
      (isRegisteredCoordinator) before being used for task scoping. Unregistered
      IDs are silently ignored — caller sees unscoped view.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inator lifecycle state machine

- atomic.ts: atomicWriteFileSync/atomicWriteFile (write-to-tmp + rename) eliminates torn
  MCP config files on crash; all four config write sites now use it
- replay-cache.ts: ReplayCache<T> replaces inline recentlyDelivered Map; TTL-based with
  eviction on write; key includes coordinatorTaskId to prevent cross-coordinator replay
- preamble.ts: extracts removePreambleBlock, detectPreambleFiles, filterDiffSections,
  buildNormalizedPreambleFileDiff, stripPreambleFromBranch as standalone tested exports;
  removes five private methods from Coordinator god class (~150 lines)
- types.ts: adds CoordinatorLifecycle = 'starting'|'ready'|'closing'|'closed'; transitions
  enforced in registerCoordinator/setMCPServerInfo/deregisterCoordinator
- register.ts (P3): moves mcpConfig + mergedMcpJson computation before Docker copyFileSync
  so .mcp.json merge logic failures leave no filesystem residue
- coordinator.test.ts: atomic.js mocked directly; removePreambleBlock tested via preamble.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, Gitleaks)

- Knip: dead-code detection; found 2 dead files + 18 unused exports (not removed, reported)
- dependency-cruiser: enforces src/→electron boundary, no-circular, no-orphans
- Semgrep: 9 custom rules covering token-in-URL, innerHTML XSS, eval, shell injection, pkill-f
- Gitleaks: secret scanning with custom rules for MCP tokens and bearer tokens
- Scripts: lint:dead, lint:arch, lint:security, lint:secrets, check:static in package.json
- Pre-commit: Husky runs lint-staged + check on every commit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tokens, atomic EXDEV, hydration gating

P1 security:
- Require X-Coordinator-Id for coordinator-class task routes (no unscoped fallback)
- Per-task X-Done-Token header for signal_done; shared subtaskToken no longer sufficient
- atomicWriteFile: use dirname(filePath) not os.tmpdir() to avoid cross-filesystem EXDEV
- TerminalView: install MCP ready-state watcher regardless of initial status (fix retry spawn)
- hydrateTask: throw if coordinator not registered; gate child hydration on coordinator ready

P2 correctness:
- Forward dockerImage in both createTask and retryTaskMcpStartup StartMCPServer calls
- StartRemoteServer: rebind loopback-only MCP server to 0.0.0.0 on explicit remote start
- Preserve pre-existing .mcp.json parallel-code entry; restore on deregistration

Minor: semgrep/gitleaks scripts check for installation, print instructions if missing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove TODOS.md, design.md, and openspec coordinator change proposal.
Fix rebase-introduced duplicate declarations in focus.ts, types.ts,
and missing questionActive param in PromptInput.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- navigation.test.ts: add focusedPanel to mock store and deep-path setStore
  handler to match setActiveTask's usage of store.focusedPanel
- focus.test.ts: update ai-terminal panel expectation to include agent ID
  suffix since normalizeTaskPanel now resolves 'ai-terminal' → 'ai-terminal:<id>'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brooksc brooksc force-pushed the feature/orchestrator-control-v2 branch from c80a029 to aba40fc Compare May 14, 2026 18:44
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 14, 2026

Round-2 review fixes

This update addresses all 9 issues from @johannesjo's review of the b55dcc9..c80a029 range.

P1 Security

Coordinator token scoping — coordinator-class tokens now require an X-Coordinator-Id header on all task routes. Previously the header was optional and missing/unregistered IDs fell back to unscoped access, letting any coordinator process with the shared remote token read/control tasks owned by other coordinators.

Per-task done tokenssignal_done now requires a per-task X-Done-Token header in addition to the subtask bearer token. Each sub-task's MCP config receives a unique PARALLEL_CODE_MCP_DONE_TOKEN at spawn time; the /done route verifies it with a constant-time compare. The shared subtask token alone is no longer sufficient to signal completion for an arbitrary task.

Atomic write EXDEVatomicWriteFile now creates temp files in dirname(filePath) instead of os.tmpdir(). The previous os.tmpdir() approach threw EXDEV whenever /tmp and the target path (worktree, Docker bind-mount, external volume) were on different filesystems.

MCP retry spawnTerminalView now installs the MCP ready-state watcher unconditionally on mount (guarded by a spawned flag), rather than only when the initial status is pending. Previously, mounting in error state skipped both the watcher and spawnNow(), so clicking Retry would clear the overlay but never start the PTY.

Child hydration gatinghydrateTask now throws if the coordinator is not registered, and App.tsx skips child hydration when the coordinator's MCP restore result is not ready. Previously Promise.allSettled continued hydrating children even when the coordinator had failed to restore, leaving them with broken MCP wiring.

P2 Correctness

dockerImage propagationcreateTask and retryTaskMcpStartup now both forward dockerImage to StartMCPServer, so coordinator-created sub-tasks use the configured image rather than falling back to the default.

Remote server rebind — when StartRemoteServer is called after MCP has already started a loopback-only server, it now rebinds to 0.0.0.0 before advertising WiFi/Tailscale URLs. Previously it returned the existing loopback listener with unreachable network URLs.

Preserve existing .mcp.json entry — the coordinator now captures any pre-existing parallel-code entry in .mcp.json before overwriting it, and restores it on deregistration. Previously deregistration unconditionally deleted the key, removing user-owned or concurrent config.

Minor

package.json semgrep/gitleaks scripts now check whether the CLI is installed and print setup instructions if not, rather than failing silently.

@johannesjo
Copy link
Copy Markdown
Owner

Wow! That is a lot of commits! Thank you very much! I have look :)

@johannesjo
Copy link
Copy Markdown
Owner

Code review (round 3)

Independent re-review covering the round-2 hardening commits. The token scoping / X-Done-Token / atomic-write changes look solid and the negative-test coverage in coordinator-scoping.test.ts is excellent. Findings below are what I would still want addressed before merge, grouped by severity.

P1 — should fix before merge

# Issue Suggested fix
1 hydrateTask does not validate opts.id. A tampered persisted state with id="../../tmp/x" flows into getSubTaskMcpConfigPath to compute the "expected" path; the safety check at coordinator.ts:1133-1142 compares against that same un-validated id, so the strings match and a 0600 token file is written outside os.tmpdir(). Local-only (state-file write already implies privilege), but it defeats the explicit "won't write outside tmp" intent of the safety check. UUID-validate opts.id at the top of hydrateTask (and MCP_HydrateCoordinatedTask IPC handler in register.ts:1254).
2 deregisterCoordinator race with in-flight createTask. createTask awaits createBackendTask (worktree spawn) before inserting into this.tasks. If deregisterCoordinator runs in the gap, the cleanup loop at coordinator.ts:1327 doesn't see the new task; the subsequent this.tasks.set(...) leaves an orphan with no live coordinator state, and setMCPServerInfo's rewrite loop never fires for it. After each await in createTask, re-check this.coordinators.has(coordinatorId); if gone, run cleanup and throw.
3 MCP HTTP server binds 0.0.0.0 whenever Docker is used (register.ts:1366). Coordinator-class token then reachable from LAN even when user only wanted local Docker. At minimum surface a "remote access active" banner; ideally bind to docker bridge/host.docker.internal only.
4 hydrateTask subscriber/decoder leak on failure (coordinator.ts:1174-1213). tailBuffers/decoders/subscribers are populated unconditionally; if safeMcpConfigPath write throws (catch swallows), the entry stays in the maps for an agentId that never spawns. Add cleanup in the catch block.
5 signalDone mislabels non-exited task status (coordinator.ts:1483). Tasks still in 'creating'/'running' that call signal_done get a "ready for review" notification while the renderer state may still be inconsistent. Only mark as 'idle' when task.status === 'idle'; otherwise stamp 'running' and queue normally.

Refuted from round-3 draft

  • orch.signalDone(taskId) lacks try/catchCoordinator.signalDone (line 1437) is synchronous, returns boolean, no throw paths. Current code is fine.

P2 — should fix

# Issue Suggested fix
6 waitForSignalDone retry can overshoot timeoutMs by up to 30 s (client.ts:149). Pre-attempt remaining<=0 short-circuit happens at the next iteration, after the sleep. delayMs = Math.min(delayMs, remaining ?? Infinity).
7 previousMcpParallelCode clobbers concurrent user edits to .mcp.json on deregister (coordinator.ts:1287). Only restore if the current value still equals what we wrote at startup.
8 validateBranchName allows leading/trailing /, @{, .lock, //, leading .. All rejected by git check-ref-format later, but adding them upstream keeps validation centralised. Extend the regex check.
9 atomicWriteFile does not fsync before rename (atomic.ts:35). Power loss → empty token file at the final path → silent auth failures on restart. await fh.sync() on the temp fd before rename, ideally also dir fsync.
10 removePreambleBlock returns content unchanged when END marker missing (preamble.ts:23). The WIP: auto-commit before merge would then commit the preamble into branch history. Log warning + fallback to drop everything from <sub-task-mode> to EOF.
11 createTask preamble restore is fire-and-forget (coordinator.ts:696-706). On error, the IIFE may run after cleanupTask removes the worktree → restore writes into a stale path. await the restore before throwing.
12 coordinator constructor PTY listeners have no teardown (lines 98, 141). Comment says "lives for app lifetime", but if enableCoordinatorMode is ever called twice (tests, future reload), listeners leak. Make idempotent or retain onPtyEvent unsub handles.

Test gaps (would have caught #1, #2, #6, #8)

  • hydrateTask rejects id containing ..//
  • validateBranchName rejects leading /, trailing /, @{, .lock
  • createTask vs concurrent deregisterCoordinator leaves no orphan in this.tasks
  • waitForSignalDone honors timeoutMs under network flapping mock

Process note

18 702 additions / 79 files in one PR is tough to review with confidence. The security-fix series (atomic writes, X-Coordinator-Id, X-Done-Token, retry wrapper) and the feature additions (control banner, expanded create_task options) are each defensible standalone landings — splitting future work would help reviewability.

Overall: thoughtful, defensively-designed work. The round-2 hardening addressed the worst issues, and the remaining P1 items are mostly tightening rather than fundamental rework.

@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 15, 2026

@johannesjo thank you for your feedback and patience as I work through this.

Yes this is a very large PR, I didn't realize it would be so large when I started this. Given where we are, I'm happy to withdraw this and resubmit as a series of smaller PRs - or otherwise I'll continue to submit only fixes to any issues you find. Any future PRs I'll break down into smaller chunks and run by you in discussions any major planned changes.

I'll submit an update shortly with fixes for the issues you found - let me know if you'd prefer me to withdraw and submit smaller PRs.

At this point the feature is useful and I've been using it myself to give the coordinator a long list of items to dispatch and then sitting back, monitoring progress and waiting until it's all completed. But I have ideas to continue to refine this and will discuss that outside of this PR.

…ync, branch hardening

P1 fixes:
- UUID-validate id/coordinatorTaskId in MCP_HydrateCoordinatedTask IPC handler
  (prevents tampered persisted state from directing token writes outside tmpdir)
- Fix createTask/deregisterCoordinator race: re-check coordinator presence after
  createBackendTask await; clean up worktree and throw if coordinator gone
- Log warning when Docker mode forces 0.0.0.0 bind (LAN-reachable API)
- Fix hydrateTask subscriber/decoder leak: wrap output-monitoring setup in
  try/catch that removes tailBuffers/decoders/subscribers/task on failure
- Fix signalDone mislabeling: skip maybeQueueReviewNotification for 'creating'
  status (renderer state inconsistent before agent has spawned)

P2 fixes:
- Cap waitForSignalDone retry sleep to remaining timeout (prevents 30s overshoot)
- Track writtenMcpParallelCode in CoordinatorState; deregister only
  restores/deletes parallel-code key when current value matches what we wrote
- Harden validateBranchName: reject leading/trailing /, .lock suffix, //, leading .
  (all rejected by git check-ref-format)
- Add validateUUID export to validation.ts
- atomicWriteFileSync: use openSync/writeSync/fsyncSync/closeSync for durability
- atomicWriteFile: use open/fh.writeFile/fh.sync/fh.close for durability
- removePreambleBlock: drop to EOF when END marker missing (prevents injected
  coordinator instructions being committed into branch history)
- Await preamble restore in createTask catch block (fire-and-forget could race
  with cleanupTask removing the worktree)
- Store onPtyEvent unsub handles in constructor; call prior handle before
  re-registering to prevent listener leak on repeated constructor calls

Tests: 12 new tests covering UUID validation, extended branch-name rules,
createTask/deregister race, waitForSignalDone timeout cap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 15, 2026

Round 3 review — all issues addressed ✅

Thanks for the thorough review. Here's a summary of every point raised and what was done.


P1 — Critical

P1-1 · UUID validation in hydrateTask

Added validateUUID to electron/mcp/validation.ts (RFC-4122 regex, exported). Validation is applied at the IPC handler boundary in electron/ipc/register.ts for both args.id and args.coordinatorTaskId — the actual trust boundary where external input enters persisted state. It is intentionally not placed inside hydrateTask itself, which is called by internal code that always uses crypto.randomUUID().

P1-2 · Race: coordinator deregistered while createBackendTask awaits

After await createBackendTask(...) returns, we now check whether the coordinator is still in the coordinators map. If it was deregistered during the await, we fire-and-forget a deleteTask cleanup and throw so the tool call surfaces an error instead of leaving a dangling task.

P1-3 · Docker mode binds MCP server to 0.0.0.0

The bind address is now args.dockerContainerName ? '0.0.0.0' : '127.0.0.1'. A console.warn is emitted in Docker mode to make the exposure visible in logs.

P1-4 · Resource leak when output monitoring setup fails

The tail-buffer / decoder / subscriber setup is now wrapped in a try/catch. On failure, all three maps are cleaned up and the task entry is removed before re-throwing.

P1-5 · signalDone skips notification when task is in 'running' state

The guard was changed from status === 'idle' || status === 'exited' to status !== 'creating'. Workers that call signal_done while still in 'running' state (the normal case) now correctly trigger the review notification.


P2 — Important

P2-6 · atomicWriteFileSync — missing fsync before rename (sync path)

Rewrote to use the low-level openSync / writeSync / fsyncSync / closeSync sequence so the kernel flushes the data before the rename. The error path closes the fd and unlinks the temp file.

P2-6 · atomicWriteFile — missing sync() before rename (async path)

Switched to the file-handle API (open / fh.writeFile / fh.sync / fh.close / rename) so fsync is called before the rename on the async path too.

P2-7 · Track what we wrote to .mcp.json to detect concurrent edits before restoring

setMcpJsonInfo now accepts and stores a writtenMcpParallelCode field on coordinator state. During deregistration, we compare the current on-disk value against what we wrote. If they differ (user edited the file), we leave it untouched. If they match (or the field was never set, for backward compat), we restore or remove as before.

P2-8 · Branch-name validator missing several git-ref rules

Added checks for: starts with ., starts with /, ends with /, ends with .lock, contains @{, contains //. These are the remaining cases from git check-ref-format.

P2-9 · Retry-sleep in MCP client can overshoot the timeout

delayMs is now Math.min(1_000 * 2 ** attempt, 30_000, remainingAfterFail ?? Infinity) so the sleep is capped to the remaining budget.

P2-10 · Preamble: missing END marker should not silently preserve coordinator instructions

Changed the fallback from returning content unchanged to dropping everything from the START marker to EOF, preventing coordinator instructions from being committed into branch history. A console.warn is emitted.

P2-11 · deregisterCoordinator — preamble restore is not awaited

The fsWriteFile / fsUnlink calls are now inside an await-ed try/catch block so errors don't become unhandled rejections and the restore completes before the coordinator is removed from state.

P2-12 · PTY event listeners registered multiple times if coordinator is re-initialized

Added unsubPtyExit and unsubPtySpawn instance fields. Each registration now calls the prior unsubscribe handle before registering a new listener, making it safe for re-initialization.


Test coverage

All 12 fixes have corresponding tests. The suite grew from 794 → 806 passing tests. TypeScript strict-mode, ESLint (0 warnings), Prettier, compile, and build:mcp all pass clean.

brooksc and others added 2 commits May 14, 2026 23:09
…ing doc)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ordinator+Docker active

On Linux, sub-task containers use --network host so they share the host's
loopback — bind to 127.0.0.1 instead of 0.0.0.0, removing LAN exposure.

On macOS, Docker Desktop routes sub-task traffic through its internal VM
network via host.docker.internal, which requires 0.0.0.0 on the host side.
Binding to just the Docker bridge interface is not practical: Docker Desktop
does not expose a stable docker0 equivalent on macOS and the internal host
IP is documented as variable. So 0.0.0.0 remains necessary on macOS.

Adds a visible warning in two places when coordinator + Docker is active on
macOS: a banner in the NewTaskDialog at task creation time, and a persistent
note in the TaskPanel coordinator control bar while the task is running.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 15, 2026

P1-3 follow-up: Docker + coordinator MCP bind address

Addressed in commit 3eec8af. Here's the full picture.

What was done

Linux (--network host): sub-task containers share the host's network namespace, so 127.0.0.1 inside the container IS the host's loopback. Changed the bind address from 0.0.0.0 to 127.0.0.1 on Linux — clean improvement, LAN exposure removed entirely.

macOS: two visible warnings added:

  • A banner in NewTaskDialog when coordinator + Docker are both selected at task creation time
  • A persistent note in the TaskPanel coordinator control bar while the task is running

Both are scoped to isMac so they don't show up on Linux where the concern doesn't apply.

Why a tighter bind on macOS isn't straightforward

On macOS, Docker Desktop runs in a Linux VM. Sub-task containers connect to the host via host.docker.internal, which Docker Desktop resolves to the Mac's IP on its internal virtual network adapter. For the host to accept those connections, it must listen on that virtual interface — and the only reliable way to do that is 0.0.0.0.

The alternative would be to discover the Docker Desktop bridge IP at runtime and bind to it specifically. The options are all fragile:

  • os.networkInterfaces() — Docker Desktop's virtual adapter has no stable name; it has changed across Docker Desktop versions and does not carry a "docker" label
  • docker network inspect bridge — async shell exec at MCP startup time, adds latency, and Docker Desktop on macOS does not expose a docker0 equivalent the way Linux does
  • Hardcode 192.168.65.1 — Docker Desktop has changed this subnet before and documents host.docker.internal precisely because the IP is not stable

The security exposure is real but limited: the MCP traffic itself (sub-task → host.docker.internal → host) routes entirely through Docker Desktop's internal virtual network adapter and never traverses the physical LAN. What 0.0.0.0 adds is that the port is also reachable from other hosts on your LAN — they can connect to it, but cannot do anything useful without the 192-bit token. The warnings now make this tradeoff visible to the user rather than silent.

If a more robust solution for macOS becomes possible (e.g. Docker Desktop exposing a stable bridge interface or a host-gateway bind target), that would be worth revisiting.

@johannesjo
Copy link
Copy Markdown
Owner

Code review (round 4)

Thanks for the fast turnaround. I ran an independent multi-pass review over the round-3 fix commits (2b21610, 3eec8af). Of the 12 claimed fixes: 8 are clean, P1-2 and P2-7 need tightening, and P1-4 and P2-12 need rework before merge. Per-item verdicts below.

P1 — round-3 items

# Item Verdict
P1-1 hydrateTask UUID validation Correct. Placed at the MCP_HydrateCoordinatedTask IPC handler — verified that's the only caller of coordinator.hydrateTask, so it's the right trust boundary, and both id and coordinatorTaskId are gated; the regex blocks all traversal chars. Nit: the comment says "v4" but the regex accepts any UUID-shaped hex (harmless — still safer than v4-strict). Optional: a one-line contract comment on hydrateTask so a future second caller can't silently reopen this.
P1-2 createTask/deregisterCoordinator race ⚠️ Half-done. The race itself is correctly closed — the re-check is atomic (no await between coordinators.has() and this.tasks.set). But the cleanup deleteTask(...).catch(() => {}) (coordinator.ts:~447) silently swallows a failed worktree/branch removal — exactly the invisible orphan this round is meant to eliminate, and inconsistent with the codebase's own pattern (cleanupTask surfaces the identical failure via console.warn + the MCP_TaskCleanupFailed channel). Route it through the same channel, or at minimum log it.
P1-3 Docker 0.0.0.0 bind Exceeds the ask. Linux --network host127.0.0.1 removes LAN exposure entirely (verified containers run --network host); macOS 0.0.0.0 is genuinely unavoidable, correctly token-protected (constant-time compare, never logged), and now warned in NewTaskDialog + TaskPanel.
P1-4 hydrateTask subscriber/decoder leak Misplaced — the reported failure still happens silently. The new try/catch (coordinator.ts:~1188-1234) wraps Map.set / new TextDecoder() / subscribeToAgent — none of which throw (the last has its own inner catch). The failure I flagged is the atomicWriteFileSync(safeMcpConfigPath, …) at coordinator.ts:~1177, which has its own swallowing catch and runs before the maps populate. A token-write failure is still swallowed and leaves a task in this.tasks with no working MCP config and no surfaced error. Handling needs to live in (or propagate out of) that token-write catch, not around the map setup.
P1-5 signalDone status mislabel Correct. The status !== 'creating' guard is sound, and treating an explicit signal_done from a running/idle task as ready-for-review is better semantics than fabricating a running status. Reasonable, intentional divergence from the literal suggestion.

P2 — round-3 items

# Item Verdict
P2-6 retry-sleep overshoot cap ✅ Correct (Math.min(…, remainingAfterFail ?? Infinity)).
P2-7 .mcp.json concurrent-edit detection ⚠️ JSON.stringify(current) === JSON.stringify(weWrote) is key-order sensitive. Common path is safe, but a Prettier-on-save / jq / IDE "sort keys" pass over .mcp.json makes it misfire → the tool refuses to remove its own entry → permanent stale parallel-code server (dead port/token) in the user's file. Use a sorted-key compare or a marker-field check ("is our token still in env?").
P2-8 branch-name hardening ✅ Correct. Minor: the @{ check is dead (the shell-metachar regex rejects { earlier) — input is still correctly rejected, just redundant.
P2-9 atomicWriteFile fsync ✅ File fsync is correct on both paths. Two notes: the docstring now overclaims rename durability (without a directory fsync after rename, a crash can still lose the rename — soften the comment or add the ~3-line dir fsync); and the sync path's writeSync isn't looped, so it's a latent partial-write bug for large buffers (fine for these tiny token files).
P2-10 preamble drop-to-EOF ✅ Correct + console.warn; test updated.
P2-11 awaited preamble restore ✅ Correct — race with cleanupTask removed.
P2-12 PTY listener teardown Dead code — doesn't fix the leak. unsubPtyExit/unsubPtySpawn are instance fields; this.unsubPtyExit?.() at the top of the constructor is always null on a fresh instance, and a second new Coordinator() is a different object that can't reach the prior instance's handle. Production is fine only because enableCoordinatorMode has if (coordinator) return; (real singleton) — so this addresses neither the tests nor the reload scenario it claims to, and coordinator.test.ts's new Coordinator() per beforeEach now leaks a PTY listener per test. Either add a real dispose() called in test teardown, or drop the machinery and document that the singleton guard is the guarantee (the original comment was honest).

Cross-cutting (not a blocker): the new synchronous fsync makes atomicWriteFileSync block the main process, and setMCPServerInfo calls it once per sub-task in a loop (N serial blocking fsyncs in one IPC handler). Fine at ~3 workers; if you push worker counts up, move those callers to the async atomicWriteFile you already fixed.

Merge readiness: the branch is currently in conflict with main and the quality check is pending — it'll need a rebase + green CI before it can land regardless of the above.

On withdrawing vs. continuing: let's do the split. The security series here is solid and self-contained — I'd rather land that behind experimental.coordinatorMode as its own PR than keep growing a 19k-line / 79-file branch where each independent pass surfaces a new misplaced fix. Could you carve out token-scoping / X-Done-Token / atomic-write / bind-address as PR #1 (with P1-2, P1-4, P2-12 corrected), then follow with the control-banner + expanded create_task options separately? It's clearly useful work — I want it in; smaller diffs just let me review with the confidence it deserves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants