Add trial harness and MCP server for automated multi-agent coordination#45
Open
noelsaw1 wants to merge 3 commits into
Open
Add trial harness and MCP server for automated multi-agent coordination#45noelsaw1 wants to merge 3 commits into
noelsaw1 wants to merge 3 commits into
Conversation
Replaces manual VS Code chat-panel coordination with a headless, observable, CLI-driven trial runner for the coordination layer. - bin/trial: list | doctor | validate | preflight | run - driver abstraction (gemini --yolo, codex exec --full-auto, claude -p) + a deterministic mock driver so the battery runs with no API keys - deterministic PROJECT-SPEC parser (ingestion stage 1) with validation - preflight question round + human gate before any work - isolated per-run workspace (own .tick/ + throwaway git repo) - observability: run.jsonl spine, per-agent transcripts, report/SUMMARY.md - battery: 2 build + 2 debug scenarios; test/smoke.sh green 11/11 https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
- mcp/tick-mcp.js: zero-dependency MCP stdio server exposing all 13 tick coordination verbs as typed tools, a drop-in alternative to the CLI (same src/ engine). Wiring docs + example .mcp.json + mcp-smoke.js (8/8). - harness --transport cli|mcp: agents coordinate via ./tick or the MCP tools; MCP mode auto-writes a workspace .mcp.json bound to the run's isolated state, and uses an MCP-flavored agent prompt. - harness/test/confirm-cli-orchestration.sh + test/fake-cli/: confirm the harness executes and monitors the real gemini/codex command shapes (gemini --yolo, codex exec --full-auto -) — prompt on stdin, identity from prompt, coordinate via tick, monitored to clean exit, path-routed with no collisions (7/7). https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
- trial mcp-doctor: preflight health check for the tick MCP server (handshake + tools/list + non-mutating tick_analyze round-trip); throwaway state by default, --repo-root . to check real state read-only. - .mcp.json: register the `tick` server at repo root so Claude Code can coordinate via tick_* tools directly. - mcp/client.js: extract the shared MCP stdio JSON-RPC client (dedupes the smoke-test client). Regression: mcp-smoke 8/8, battery 11/11, confirm-cli-orchestration 7/7. https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds two major new capabilities to the Trinity coordination layer:
experiments/coordination-layer/harness/) — a complete system for running automated, repeatable multi-agent coordination trials from the CLI, replacing manual chat-panel babysitting with headless agent execution.experiments/coordination-layer/mcp/tick-mcp.js) — exposes all coordination verbs as typed MCP tools, allowing agents to coordinate via MCP instead of shelling out to the CLI.Both are thin adapters over the existing
src/coordination engine, so behavior is identical to the CLI — they're protocol frontends, not reimplementations.Key Changes
Trial Harness (
harness/)src/spec.js) — deterministic, zero-LLM parsing of human-authored project specs (markdown) into structured task lists with validation (duplicate IDs, cycles, empty scopes all hard-fail).src/run.js) — builds isolated per-run workspaces (throwaway git repos with their own.tick/state), seeds backlogs, spawns agent CLIs concurrently, captures transcripts, runs verification, and generates reports.src/drivers.js) — unified interface for invoking Gemini, Codex, Claude, and mock agents headlessly (prompt on stdin, no chat UI).src/preflight.js) — agents ask clarifying questions before work begins; questions are collected and a human gate pauses the run for review.src/observe.js) — structured JSONL spine + per-agent transcripts + final analysis report.bin/trial) —list,doctor,validate,preflight,runsubcommands with options for agent override, transport selection, timeouts, and circuit-breaker tuning.test/smoke.sh) — validates the entire battery with the deterministic mock driver (no API keys needed).MCP Server (
mcp/tick-mcp.js)initialize,tools/list,tools/call,ping.tick_init,tick_log,tick_project,tick_take,tick_next,tick_claim,tick_scope,tick_release,tick_break,tick_done,tick_reap,tick_analyze.mcp/test/mcp-smoke.js) validates the protocol end-to-end..mcp.jsonwiring for Claude Code.Integration
prompts/agent-loop.md) and MCP (prompts/agent-loop-mcp.md) modes.prompts/preflight.md).src/prompts.js) with template rendering.src/proc.js) — spawn, feed stdin, capture stdout/stderr, enforce timeouts.Notable Implementation Details
.tick/state; the real repo is never touched. Trials are safe to run repeatedly and in parallel.harness/src/mock-agent.jsandtest/fake-cli/) — stands in for real Gemini/Codex CLIs, exercises the full coordination protocol and harness observability without API keys.https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB