feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100
feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100brooksc wants to merge 116 commits into
Conversation
2e523b0 to
b436f19
Compare
|
@cledoux95 as you initiated this I'd love to get your thoughts on my building further on your work. Here's what I'm trying to solve for.
What I'm hoping to get to:
1-3 is working here (although the take control/return control may need some work) I'm thinking I need some webhook or way for agentN to notify the coordinator when it's done (hook...?) Anyway... this may take some time to polish. @johannesjo What do you think about adding a "Beta" section in settings, when you're ready to integrate it - it can live in Beta for a little while so others can opt-in and shake it out if they want. I think it'll take some time to live with this to continue to refine it. I'm using it myself going forward. |
|
Thank you very much @brooksc (and @cledoux95 !) !! This is impressive work, and the integration fixes on top of #31 are exactly the kind of stuff that's painful to discover from the outside. I want to ship this, but I agree with you that it's not ready as a default-on feature, and I think your "Beta in Settings" instinct is the right framing. Let me make that concrete. Yes to beta-gating, with a real flag. Specifically:
Three things I'd want addressed before I merge, even into beta:
On your unsolved step #4 (coordinator noticing "agent done"): I think Claude Code's Graduation criteria I'd want to hit before flipping the flag on by default:
If you're up for the gating work I'm happy to keep this open and help where useful. And to be clear: I do want to ship this – the pilot/co-pilot pattern is the right shape for human-in-the-loop orchestration, and you've already shaved off a lot of the rakes. |
|
@johannesjo — thanks for the detailed review, this is exactly the kind of feedback that makes it worth sharing early. Lots to respond to. Where things stand: working e2eSince the original PR I've built considerably more on top. Here's what's confirmed working end-to-end: Coordinator creation & MCP tooling
Sub-task lifecycle
Coordinator ↔ sub-agent notification — and why we didn't use Stop hooks This is where we diverge from your suggestion, and I think for good reason. You suggested Instead, sub-agents call the Additional notification paths:
Pilot / co-pilot control handoff — how it works and two open items Each sub-task panel shows a banner indicating who's driving. When the coordinator has control the banner reads "Coordinator driving" in grey. Clicking "Take Control" blocks the coordinator's Two things to note as known gaps: first, the button labels ("Take Control" / "Return to Orchestrator") overstate what's happening — the coordinator agent keeps running, only its ability to write to that terminal is paused. "Pause coordinator" / "Resume coordinator" would be more accurate. Second, the sub-task's PromptInput and raw xterm terminal are not currently gated on control state, so a user could type simultaneously with a coordinator Unit test coverage What isn't done yetBeta gating — you're right that zero-footprint opt-in is the right approach. We haven't done the Settings dialog placement — with coordinator mode, Docker, verbose logging, and more to come, Settings is getting long. A tabbed settings dialog (General / Experimental) would be a natural home for the beta flag, but that's its own PR and should land independently of this one. Docker + coordinator — currently mutually exclusive. Sub-tasks spawned by a coordinator run as native host processes, defeating Docker isolation. Two approaches documented in Post-restart MCP path integration test — fair ask. The port/token rotation path is the right thing to test. Will add.
Minor items (in KNOWN-TODOS.md): orphaned sub-task badge UI, re-stage notification after user manually sends an edited prompt, configurable notification delay, control handoff input gating and button renaming. Happy to tackle the beta gating as the next chunk. Does the |
4057d03 to
0be08ca
Compare
|
Thanks @brooksc — the depth of the response (and the 379 tests!) tells me where this is heading, and I'm on board with the shape. A few replies and a few new things I noticed reading the diff today.
Beta gating shape: skipPermissions guardrail — the propagation checkbox is the right shape. Good refinement: a "propagate skip-permissions to sub-tasks" checkbox in the New Task dialog, defaulting off even when the coordinator itself was launched with skip-permissions. Don't auto-inherit silently. The 40-tasks-at-a-time workflow is real and worth optimizing for. A few things I caught on a code pass that haven't come up yet:
Item #1 is the only one that affects whether the test plan currently passes; the rest can ride along with the beta-gating PR. Excited about this — the pilot/co-pilot framing has aged well over the discussion. |
0be08ca to
ad6746d
Compare
Update — coordinator mode: major second pass, largely self-developedWe've done a substantial second pass on this feature. The headline: most of the work in this update was done by the coordinator itself — we used the tool to fix the tool. Video demo (recorded mid-session, before max-concurrent UI was added — shows the core flow live) Design document: The coordinator ran this PRTo validate the implementation, we gave the coordinator a single prompt:
From that one prompt, the coordinator:
The only human involvement was watching the coordinator work and confirming the final result — no manual coding. What changed since the initial revisionControl model renamed and extended
Max concurrent tasks UI Coordinator preamble rewritten from scratch
13 bugs fixed (all found during live coordinator sessions, most fixed by sub-agents):
Tests: 464 passing across 33 test files — this PR adds ~2,057 lines of new test code covering coordinator notification logic, control handoff, PTY idle detection, and git operations. What's still openTwo known edge cases, intentionally deferred:
Not yet tested: Docker integration — the coordinator Test plan
🤖 Generated with Claude Code |
|
I rechecked the current head (
|
|
I rechecked current head
These are mostly boundary/restore/scoping issues, but they affect the exact scenarios the beta gate is meant to make safe. |
|
Pushed a significant round of fixes and improvements since the initial draft. Summary below. Non-Docker coordinator — working well end-to-endThe core coordinator flow is solid in non-Docker mode and has been tested through several full runs. Key improvements in this update: Take Control / Release Control Autofire robustness Coordinator preamble Correctness / data integrity
Reliability
Security
UX
Docker coordinator — functional, one open design questionDocker mode works but there's an architectural question I'd like your input on before closing it out. The bind address problem: The coordinator's MCP HTTP server needs to be reachable from inside the Docker container via This is the same concern you raised on the mobile-remote surface, except now the exposed endpoints are fully mutating: Options on the table:
Option (a) from your earlier suggestion — bind to Happy to implement whichever direction you prefer. The rest of the Docker fixes ( Also noting a future improvement worth discussing: switching from |
|
I rechecked current head
These are all fixable, but they hit the core beta promises: no config loss, safe skip-permission gating, bounded network surface, Docker restart correctness, and reliable restart recovery. |
Update: Security hardening, refactoring, and static analysis toolingThis update adds to the original orchestrator control work across three areas. Security fixesP1 — Two-class token model (
P1 — Server bind address (
P1 — Route scoping and validator hardening (
P2 — Replay cache scoping (
P2 —
P3 — Docker-mode ordering (
RefactoringAtomic file writes (
Replay cache extracted (
Preamble module extracted (
Coordinator lifecycle state machine (
Testing
Static analysis toolingAdded four tools, all running locally (no GitHub Actions required):
Current findings:
Outstanding: Docker networking designThe P3 fix (ordering) is merged, but the underlying design has a deeper issue that isn't addressed here: the current model assumes one coordinator → one Docker container. When a coordinator spawns multiple sub-tasks, each needs its own container, but they all currently share a single PS: I'm traveling this week, so if you find any issues I'll resolve them later in the week. |
|
Thank you very much for the update! <3 Reviewed the update range
Smaller follow-up: |
Adds HTTP REST endpoints for task management (create, list, get, prompt, wait, diff, output, signal-done, merge, close) plus an MCP log ring buffer. Also adds AGENTS.md (project context for AI agents) and rebuild-integration.sh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… diff truncation metadata, gitIsolation schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wait_for_signal_done redesign: removed taskId parameter; returns whichever sub-task completes next with a `remaining` count so the coordinator can loop naturally without guessing order - merge_task / review_and_merge_task: operate in the sub-task's worktree so the target branch doesn't need to be checked out in the main repo - Docker: sub-agents spawned via docker exec into the coordinator's container when coordinator runs with Docker mode enabled - Preamble injection: writes signal_done instructions to the right file per agent type (AGENTS.md / GEMINI.md / .agent.md / settings.local.json) in the worktree rather than the project root; file is cleaned up by close_task - Notification system: suppress pending notifications during signal waits to prevent double-notify; escalate to orphaned notification after 10 missed autofire attempts; clear stale notifications on coordinator deregistration - get_task_output: append truncation sentinel when scrollback exceeds 20 000 chars so the coordinator knows it saw partial output - wait_for_idle: return reason field (idle / exited / human_control) - Autofire: countdown UI; don't treat previous staged text as user edit - Persistence: persist mcpConfigPath across app restarts; rewrite config on startup with current port/token - Tests: 63 unit tests in coordinator.test.ts; git mergeWorktreePath test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coordinators can drive sub-tasks via send_prompt; users can reclaim a task at any time with Take Control and hand it back with Release Control. This PR extends that model to the coordinator task itself. Control bar (TaskPanel) - Renamed "Pause/Resume coordinator" → "Take Control / Release Control" for clearer ownership language - Extended to coordinator tasks (coordinatorMode) in addition to sub-tasks (coordinatedBy): the same bar now appears on the coordinator's own panel - Auto-mode label: "Auto mode" for coordinator tasks, "Coordinator driving" for sub-tasks Input locking (PromptInput + TerminalView) - PromptInput textarea is disabled when controlledBy === 'coordinator'; onKeyDown/onInput guards provide belt-and-suspenders protection - TerminalView: term.options.disableStdin toggled reactively so xterm stops forwarding keystrokes to the PTY (the real input path users take) - Autofire interval skips ticks (no miss count) when controlledBy is human Coordinator task panel UX - Invisible pointer-events overlay gives coordinator task the same cursor:not-allowed treatment sub-tasks already had - Overlay click propagates to maybeShowControlHint, enabling the discoverability tooltip (shown ≤3 times, dismissible, "Don't show again") - PromptInput panel hidden (display:none, minSize getter → 0) in auto mode while keeping the component mounted so autofire keeps running Control state persistence - controlledBy persisted in PersistedTask; restored on app restart - Default: 'coordinator' for coordinator/coordinated tasks, undefined otherwise - setTaskControl skips the MCP_ControlChanged IPC for coordinator tasks (backend only knows sub-tasks, not the coordinator task itself) Bug fixes - MCP_TaskCreated handler now sets controlledBy:'coordinator' on new sub-tasks (was missing, leaving textarea unlocked) - Closing a coordinator task clears controlledBy on all children so their textareas re-enable when the control bar disappears - setTaskControl calls saveState() so Take Control survives autosave Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- design.md: comprehensive write-up of the coordinator mode feature — architecture diagram, Coordinator class internals, MCP tool surface, wait_for_signal_done redesign rationale, notification system, control handoff, autofire, discoverability hint, settings, security, Docker, New Task Dialog changes, and test coverage summary - TODOS.md (renamed from KNOWN-TODOS.md): easy coordinator test tasks, medium known issues (get_task_output truncation, merge_task live run, autofire timing edge case, MCP config staleness), hard backend hydration issues, and frontend test requirements (items 12–16) - Removed AGENTS.md from project root (preamble is now written per-agent into the worktree by the coordinator, not the project root) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that newly created sub-tasks have controlledBy set to 'coordinator' and coordinatedBy set to the coordinator task ID, with a regression guard ensuring controlledBy is always defined. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l tick Extracts the autofire tick decision logic into processAutoFireTick (autofire-tick.ts) so it can be tested as a pure function without mounting the SolidJS component. Tests verify: - controlledBy==='human' returns 'paused' without touching the miss counter - controlledBy==='coordinator' increments miss count and fires when prompt visible Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts disableStdin logic into computeDisableStdin() and adds tests for coordinator-controlled (true), human-controlled (false), and undefined controlledBy (false) scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge_task tool merges into baseBranch (the coordinator's feature branch), not necessarily main. Update both the MCP tool description and coordinator preamble to reflect the correct behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The coordinator preamble was being injected twice — once in NewTaskDialog.tsx (from src/lib/coordinator-preamble.ts) and again in src/store/tasks.ts (from src/store/coordinator-preamble.ts), causing coordinators to start with duplicated and contradictory instructions. Remove src/lib/coordinator-preamble.ts and its prepend in NewTaskDialog.tsx, consolidating all rules into the store version as single source of truth. The store version now includes the BAD/GOOD task-assignment guidance, baseBranch reminder, max-3-concurrent rule, file-overlap guidance, and the verify-before-assigning rule from the deleted lib version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_merge_task Both handlers now append ' [NOT COMMITTED — will be auto-committed on merge]' to any file entry where committed === false, matching the safety context the file-object type already carries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract validateStartMCPServerArgs from the handler into a testable exported function, then add Layer 4 tests covering all five rejection paths (non-absolute projectRoot/worktreePath, ".." traversal, non-string agentArgs elements, shell-special dockerContainerName). Each rejection test also spies on fs.writeFileSync/copyFileSync to confirm no I/O occurs before the error is thrown. Two positive tests verify optional fields (worktreePath, dockerContainerName) can be omitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…johannesjo#39 - johannesjo#33: restart round-trip — hydrateTask rewrites config with new subtaskToken (not coordinator token) when setMCPServerInfo has already run; and waitForIdle resolves after agent output fires post-hydration - johannesjo#36: mcpConfigPath directory scoping — path traversal rejected, wrong-dir rejected, correct host tmpdir accepted, Docker dirname(serverPath) accepted, Docker wrong-dir rejected - johannesjo#39: per-task close isolation — closing task-1 deletes only its config file; task-2 config and task entry remain untouched Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ocker kill awaited (johannesjo#41)
…annesjo#45 (wait_for_signal_done retry on network error)
…t after preamble block wait_for_signal_done (electron/mcp/client.ts): - On network-level TypeError (fetch failed, ECONNRESET), retry with exponential backoff up to 10 attempts (max 30 s per retry). HTTP errors (4xx/5xx) propagate immediately. - Since signal_done writes durable server-side state, a retry returns the already-signaled result instantly — coordinator no longer stalls when a transient connection drops. Preamble stripping (electron/mcp/coordinator.ts): - Add removePreambleBlock(): removes only the <sub-task-mode>...</sub-task-mode> block and its surrounding blank-line separators; preserves all content before and after. - stripPreambleFromBranch (merge path): use removePreambleBlock instead of slicing to EOF, fixing data loss when a sub-task legitimately edits a preamble-bearing file after the injected block. - getTaskDiff (review path): replace whole-section exclusion with buildNormalizedPreambleFileDiff, which generates a diff between the base version and the preamble-stripped worktree version. Files with only preamble changes produce an empty normalized diff and are still excluded; files with real changes beyond the preamble now appear in the coordinator's review diff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… base wait_for_signal_done retry safety (client.ts, coordinator.ts, server.ts): - client.ts generates a stable requestId (UUID) per call and includes it in every retry attempt, so the server can replay the result instead of blocking again - coordinator.waitForSignalDone() accepts optional requestId; on first delivery it stores the result in recentlyDelivered (120 s TTL) keyed by requestId; on retry with the same id it returns the cached result immediately, avoiding a wait on an already-consumed signal - cacheDeliveredResult() evicts expired entries on each write to prevent unbounded growth - remote/server.ts extracts requestId from the request body and forwards it Preamble diff base (git.ts, coordinator.ts): - Export getDiffBaseSha() from git.ts — same merge-base + detectDiffBase logic used by getAllFileDiffs(), so all three callers (files, raw diff, normalized preamble diff) use identical base-selection semantics including fallback to detected main - getTaskDiff() computes baseSha once and passes it to buildNormalizedPreambleFileDiff() - buildNormalizedPreambleFileDiff() now accepts baseSha directly, eliminating the incorrect 'git merge-base HEAD HEAD' fallback when baseBranch is undefined Tests (coordinator.test.ts): - waitForSignalDone requestId replay: same id returns cached result; different id times out - removePreambleBlock unit tests: content after end marker preserved, before preserved, pure-preamble empty, malformed falls back, no-marker unchanged (6 cases) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests for johannesjo#34/johannesjo#37/johannesjo#40 - Extract validateBranchName into electron/mcp/validation.ts; apply at MCP and REST boundaries (johannesjo#38) - Fail-closed on malformed .mcp.json in register.ts: parse/validate before writing temp config (johannesjo#42) - Show staged coordinator prompt text and countdown in read-only TaskPanel without "Take Control" (johannesjo#44) - Add getTaskDiff preamble-bearing file normalization tests (AGENTS.md + settings.local.json) (johannesjo#34) - Add closeTask IPC ordering tests: MCP_CoordinatedTaskClosed and MCP_CoordinatorDeregistered (johannesjo#37) - Add .mcp.json merge/cleanup tests via deregisterCoordinator (johannesjo#40) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace local validateBranchName in register.ts and coordinator.ts with the shared validator from validation.ts — control-character and whitespace checks now apply on all entry paths (IPC, MCP, REST, coordinator-internal) - Move .mcp.json read/parse to before registerCoordinator/setMCPServerInfo so a malformed file fails fast without leaving partial coordinator state - Key waitForSignalDone replay cache by coordinatorTaskId:requestId to prevent cross-coordinator result leakage; add regression test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ferral) - removePreambleBlock: return content unchanged when end marker is missing instead of stripping to EOF — eliminates data loss risk on malformed injected blocks - persistence.ts: set mcpStartupStatus 'pending' for coordinator and coordinated tasks on restore so TerminalView defers SpawnAgent until StartMCPServer/hydrateTask completes and rewrites config with the fresh session token - Update test: malformed-block expectation now asserts content preservation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…it cache warm, temp file cleanup - detectPreambleFiles: converted to async (fs/promises) to avoid blocking main thread - stripPreambleFromBranch: converted to async (fs/promises) for same reason - getTaskDiff: serialize baseSha before parallel getChangedFiles/getAllFileDiffs so both hit the warmed cache on the second call - MCP_CoordinatorDeregistered handler: delete temp sub-task config file on deregistration - persistence.ts: set mcpStartupStatus: 'pending' for coordinator/sub-task tasks on restore - coordinator.test.ts: updated preamble-bearing tests to mock fsReadFile (async) for detectPreambleFiles while keeping existsSync/readFileSync for buildNormalizedPreambleFileDiff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardening
P1-1 StartMCPServer binds 127.0.0.1 by default; 0.0.0.0 only in Docker mode.
StartRemoteServer keeps 0.0.0.0 (explicit user action for WiFi/mobile).
P1-2 Coordinator token never returned to renderer. Removed 'token' field from
StartRemoteServer, GetRemoteStatus, and StartMCPServer IPC returns.
GetMCPStatus.serverUrl now uses getMCPRemoteServerUrl (no token).
RemoteAccess store type and initial state updated accordingly.
P1-3 Third mobileToken generated at startup. wifiUrl/tailscaleUrl/url all
embed mobileToken instead of coordinator token. Mobile callers are
classified as 'mobile' and restricted to GET /api/agents, GET /api/tasks
(read-only). Only coordinator token grants WebSocket + write routes.
P2-1 redactServerUrl() exported from server.ts; used in register.ts console.warn
to strip token query param before logging.
P2-2 validateBranchName rejects path traversal (/../) and shell metacharacters
(`$(){}[]<>\\'*?!#;|&") in addition to existing control-char checks.
P2-3 MCP_HydrateCoordinatedTask now calls validateBranchName on branchName and
baseBranch (if present) — matches validation enforced by createTask.
P2-4 PARALLEL_CODE_MCP_TOKEN added to SpawnAgent ENV_BLOCK_LIST so a crafted
IPC call cannot override the MCP auth token in spawned agents.
P2-5 X-Coordinator-Id header is now verified against registered coordinators
(isRegisteredCoordinator) before being used for task scoping. Unregistered
IDs are silently ignored — caller sees unscoped view.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inator lifecycle state machine - atomic.ts: atomicWriteFileSync/atomicWriteFile (write-to-tmp + rename) eliminates torn MCP config files on crash; all four config write sites now use it - replay-cache.ts: ReplayCache<T> replaces inline recentlyDelivered Map; TTL-based with eviction on write; key includes coordinatorTaskId to prevent cross-coordinator replay - preamble.ts: extracts removePreambleBlock, detectPreambleFiles, filterDiffSections, buildNormalizedPreambleFileDiff, stripPreambleFromBranch as standalone tested exports; removes five private methods from Coordinator god class (~150 lines) - types.ts: adds CoordinatorLifecycle = 'starting'|'ready'|'closing'|'closed'; transitions enforced in registerCoordinator/setMCPServerInfo/deregisterCoordinator - register.ts (P3): moves mcpConfig + mergedMcpJson computation before Docker copyFileSync so .mcp.json merge logic failures leave no filesystem residue - coordinator.test.ts: atomic.js mocked directly; removePreambleBlock tested via preamble.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, Gitleaks) - Knip: dead-code detection; found 2 dead files + 18 unused exports (not removed, reported) - dependency-cruiser: enforces src/→electron boundary, no-circular, no-orphans - Semgrep: 9 custom rules covering token-in-URL, innerHTML XSS, eval, shell injection, pkill-f - Gitleaks: secret scanning with custom rules for MCP tokens and bearer tokens - Scripts: lint:dead, lint:arch, lint:security, lint:secrets, check:static in package.json - Pre-commit: Husky runs lint-staged + check on every commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tokens, atomic EXDEV, hydration gating P1 security: - Require X-Coordinator-Id for coordinator-class task routes (no unscoped fallback) - Per-task X-Done-Token header for signal_done; shared subtaskToken no longer sufficient - atomicWriteFile: use dirname(filePath) not os.tmpdir() to avoid cross-filesystem EXDEV - TerminalView: install MCP ready-state watcher regardless of initial status (fix retry spawn) - hydrateTask: throw if coordinator not registered; gate child hydration on coordinator ready P2 correctness: - Forward dockerImage in both createTask and retryTaskMcpStartup StartMCPServer calls - StartRemoteServer: rebind loopback-only MCP server to 0.0.0.0 on explicit remote start - Preserve pre-existing .mcp.json parallel-code entry; restore on deregistration Minor: semgrep/gitleaks scripts check for installation, print instructions if missing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove TODOS.md, design.md, and openspec coordinator change proposal. Fix rebase-introduced duplicate declarations in focus.ts, types.ts, and missing questionActive param in PromptInput.test.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- navigation.test.ts: add focusedPanel to mock store and deep-path setStore handler to match setActiveTask's usage of store.focusedPanel - focus.test.ts: update ai-terminal panel expectation to include agent ID suffix since normalizeTaskPanel now resolves 'ai-terminal' → 'ai-terminal:<id>' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c80a029 to
aba40fc
Compare
Round-2 review fixesThis update addresses all 9 issues from @johannesjo's review of the b55dcc9..c80a029 range. P1 SecurityCoordinator token scoping — coordinator-class tokens now require an Per-task done tokens — Atomic write EXDEV — MCP retry spawn — Child hydration gating — P2 CorrectnessdockerImage propagation — Remote server rebind — when Preserve existing Minor
|
|
Wow! That is a lot of commits! Thank you very much! I have look :) |
Code review (round 3)Independent re-review covering the round-2 hardening commits. The token scoping / X-Done-Token / atomic-write changes look solid and the negative-test coverage in P1 — should fix before merge
Refuted from round-3 draft
P2 — should fix
Test gaps (would have caught #1, #2, #6, #8)
Process note18 702 additions / 79 files in one PR is tough to review with confidence. The security-fix series (atomic writes, X-Coordinator-Id, X-Done-Token, retry wrapper) and the feature additions (control banner, expanded Overall: thoughtful, defensively-designed work. The round-2 hardening addressed the worst issues, and the remaining P1 items are mostly tightening rather than fundamental rework. |
|
@johannesjo thank you for your feedback and patience as I work through this. Yes this is a very large PR, I didn't realize it would be so large when I started this. Given where we are, I'm happy to withdraw this and resubmit as a series of smaller PRs - or otherwise I'll continue to submit only fixes to any issues you find. Any future PRs I'll break down into smaller chunks and run by you in discussions any major planned changes. I'll submit an update shortly with fixes for the issues you found - let me know if you'd prefer me to withdraw and submit smaller PRs. At this point the feature is useful and I've been using it myself to give the coordinator a long list of items to dispatch and then sitting back, monitoring progress and waiting until it's all completed. But I have ideas to continue to refine this and will discuss that outside of this PR. |
…ync, branch hardening P1 fixes: - UUID-validate id/coordinatorTaskId in MCP_HydrateCoordinatedTask IPC handler (prevents tampered persisted state from directing token writes outside tmpdir) - Fix createTask/deregisterCoordinator race: re-check coordinator presence after createBackendTask await; clean up worktree and throw if coordinator gone - Log warning when Docker mode forces 0.0.0.0 bind (LAN-reachable API) - Fix hydrateTask subscriber/decoder leak: wrap output-monitoring setup in try/catch that removes tailBuffers/decoders/subscribers/task on failure - Fix signalDone mislabeling: skip maybeQueueReviewNotification for 'creating' status (renderer state inconsistent before agent has spawned) P2 fixes: - Cap waitForSignalDone retry sleep to remaining timeout (prevents 30s overshoot) - Track writtenMcpParallelCode in CoordinatorState; deregister only restores/deletes parallel-code key when current value matches what we wrote - Harden validateBranchName: reject leading/trailing /, .lock suffix, //, leading . (all rejected by git check-ref-format) - Add validateUUID export to validation.ts - atomicWriteFileSync: use openSync/writeSync/fsyncSync/closeSync for durability - atomicWriteFile: use open/fh.writeFile/fh.sync/fh.close for durability - removePreambleBlock: drop to EOF when END marker missing (prevents injected coordinator instructions being committed into branch history) - Await preamble restore in createTask catch block (fire-and-forget could race with cleanupTask removing the worktree) - Store onPtyEvent unsub handles in constructor; call prior handle before re-registering to prevent listener leak on repeated constructor calls Tests: 12 new tests covering UUID validation, extended branch-name rules, createTask/deregister race, waitForSignalDone timeout cap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Round 3 review — all issues addressed ✅Thanks for the thorough review. Here's a summary of every point raised and what was done. P1 — CriticalP1-1 · UUID validation in Added P1-2 · Race: coordinator deregistered while After P1-3 · Docker mode binds MCP server to The bind address is now P1-4 · Resource leak when output monitoring setup fails The tail-buffer / decoder / subscriber setup is now wrapped in a try/catch. On failure, all three maps are cleaned up and the task entry is removed before re-throwing. P1-5 · The guard was changed from P2 — ImportantP2-6 · Rewrote to use the low-level P2-6 · Switched to the file-handle API ( P2-7 · Track what we wrote to
P2-8 · Branch-name validator missing several git-ref rules Added checks for: starts with P2-9 · Retry-sleep in MCP client can overshoot the timeout
P2-10 · Preamble: missing END marker should not silently preserve coordinator instructions Changed the fallback from returning content unchanged to dropping everything from the START marker to EOF, preventing coordinator instructions from being committed into branch history. A P2-11 · The P2-12 · PTY event listeners registered multiple times if coordinator is re-initialized Added Test coverageAll 12 fixes have corresponding tests. The suite grew from 794 → 806 passing tests. TypeScript strict-mode, ESLint (0 warnings), Prettier, compile, and build:mcp all pass clean. |
…ing doc) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ordinator+Docker active On Linux, sub-task containers use --network host so they share the host's loopback — bind to 127.0.0.1 instead of 0.0.0.0, removing LAN exposure. On macOS, Docker Desktop routes sub-task traffic through its internal VM network via host.docker.internal, which requires 0.0.0.0 on the host side. Binding to just the Docker bridge interface is not practical: Docker Desktop does not expose a stable docker0 equivalent on macOS and the internal host IP is documented as variable. So 0.0.0.0 remains necessary on macOS. Adds a visible warning in two places when coordinator + Docker is active on macOS: a banner in the NewTaskDialog at task creation time, and a persistent note in the TaskPanel coordinator control bar while the task is running. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
P1-3 follow-up: Docker + coordinator MCP bind addressAddressed in commit What was doneLinux ( macOS: two visible warnings added:
Both are scoped to Why a tighter bind on macOS isn't straightforwardOn macOS, Docker Desktop runs in a Linux VM. Sub-task containers connect to the host via The alternative would be to discover the Docker Desktop bridge IP at runtime and bind to it specifically. The options are all fragile:
The security exposure is real but limited: the MCP traffic itself (sub-task → If a more robust solution for macOS becomes possible (e.g. Docker Desktop exposing a stable bridge interface or a |
Code review (round 4)Thanks for the fast turnaround. I ran an independent multi-pass review over the round-3 fix commits ( P1 — round-3 items
P2 — round-3 items
Cross-cutting (not a blocker): the new synchronous Merge readiness: the branch is currently in conflict with On withdrawing vs. continuing: let's do the split. The security series here is solid and self-contained — I'd rather land that behind |
Overview
This PR builds on the initial coordinating agent implementation by @cledoux95 (PR #31) with substantial fixes and new capabilities.
The core motivation: I wanted to use a coordinator agent to drive parallel workstreams, but still be able to monitor each sub-task, answer questions when an agent gets stuck, and take over a task when I want to make a decision myself — then hand it back. Think of it like a pilot/co-pilot handoff: explicit, visible, and safe.
Credit
The foundation of this PR is @cledoux95's work in #31, which introduced the coordinating agent concept, the MCP server/client, and the
create_task/send_prompt/wait_for_idle/merge_tasktools. This PR would not exist without that work.What's new / changed
Sub-task visibility (addresses the core UX gap)
In #31, sub-tasks created by the coordinator were not visible in the sidebar. The coordinator agent was essentially working in the dark. Now each sub-task spawned via
create_taskappears as its own sidebar panel, giving you full terminal visibility into what every agent is doing — exactly like a manually created task.Pilot/co-pilot control handoff
Each coordinated sub-task shows a banner indicating who is driving:
You can click "Take Control" at any time to pause the coordinator for that task, interact with the agent yourself, then click "Return to Orchestrator" when done.
Expanded
create_taskoptionsThe coordinator agent can now specify:
skipPermissions: true— passes--dangerously-skip-permissionsto the sub-agent so it runs fully autonomously without tool-approval interruptionsgitIsolation: "worktree" | "direct" | "none"— controls git isolation mode (defaults to"worktree"as before)Fixes to #31
Several bugs were found and fixed while integrating and testing:
findFreePort(7777, 7800)tries ports sequentially until one is free--mcp-configarg — the coordinator agent was spawned without the MCP config flag, so it had no MCP tools and fell back to bash orchestrationTaskAITerminalnow passes--mcp-config <path>whentask.mcpConfigPathis setspawnAgentdirectly AND the renderer'sTerminalViewalso called it, killing the backend spawn and losing the output subscriptionspawnAgentcall; orchestrator now usesonPtyEvent('spawn', ...)to subscribe to output each time the renderer spawns the agent — survives restarts toofetch failedon all MCP tool callsApp.tsxnow awaitsStartMCPServerfor each persisted coordinator task before the agents are spawned, refreshing the config file with the new port/tokendeleteTaskwrong call signature — the orchestrator calleddeleteTaskwith positional args but the function now takes an options objectdeleteTask({ agentIds, branchName, deleteBranch, projectRoot })400for invalidname,prompt,projectId,skipPermissions,gitIsolationwaitForIdlehangs when human takes control —setTaskControl('human')left pending waiters stuck until timeoutsetTaskControlnow unblocks pendingwaitForIdlecallers on any control changemcp_control_changedmissing from preload allowlistALLOWED_CHANNELSinpreload.cjsfetch failedgave no actionable infoMCPClientnow wraps network errors: "Cannot reach Parallel Code at http://127.0.0.1:PORT. Is the app running?"Tests
Added 25 new tests across two files:
electron/mcp/prompt-detect.test.ts—stripAnsiandchunkContainsAgentPrompt(the core ofwaitForIdleidle detection)electron/mcp/orchestrator.test.ts— control handoff logic:sendPromptblocked/unblocked,waitForIdlebehavior under human control, PTY exit propagationThe tests caught the
waitForIdlehang bug described above before it shipped.Test plan
fetch failed)create_taskwithskipPermissions: truespawns sub-agent without permission promptsnpm run check && npm test— all pass🤖 Generated with Claude Code
Review response (round 2)
Addressed all issues from @johannesjo's second review pass (
b55dcc9..c80a029):Security (P1)
X-Coordinator-Idgave unscoped access to all tasksX-Coordinator-Idheader on task routes; missing or unregistered header → 403signal_donenot task-scoped — any sub-task withsubtaskTokencould signal done for any other taskdoneTokenin its MCP config; must be sent asX-Done-Tokenheader on/done; wrong or missing token → 403os.tmpdir()threw onrename()when/tmpand the target are on different mountsdirname(filePath)so it's always on the same mountpending; clicking Retry fromerrordid nothingerror → readycorrectlyPromise.allSettled+ unconditional loop hydrated children even when coordinator MCP restore failedmcpStartupStatus !== 'ready';hydrateTaskthrows if coordinator is unregisteredCorrectness (P2)
StartMCPServercalls increateTaskandretryTaskMcpStartupomitteddockerImagedockerImageis now forwarded in both call sitesStartMCPServerbinds to127.0.0.1;StartRemoteServernow stops and restarts on0.0.0.0if no coordinator is active.mcp.jsonparallel-codeentry overwritten then deleted — startup overwrote any pre-existing entry; deregistration always deleted itCoordinatorState.previousMcpParallelCodeat write time and restored on deregistrationMinor
semgrep/gitleaksscripts now check whether the CLI is installed and print install instructions (brew install semgrep, etc.) rather than failing with a cryptic error; these are system tools and cannot be indevDependencies