Skip to content

Add trial harness and MCP server for automated multi-agent coordination#45

Open
noelsaw1 wants to merge 3 commits into
experiment/coordination-layerfrom
claude/hopeful-noether-9xJXI
Open

Add trial harness and MCP server for automated multi-agent coordination#45
noelsaw1 wants to merge 3 commits into
experiment/coordination-layerfrom
claude/hopeful-noether-9xJXI

Conversation

@noelsaw1

@noelsaw1 noelsaw1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds two major new capabilities to the Trinity coordination layer:

  1. Trial harness (experiments/coordination-layer/harness/) — a complete system for running automated, repeatable multi-agent coordination trials from the CLI, replacing manual chat-panel babysitting with headless agent execution.
  2. MCP server (experiments/coordination-layer/mcp/tick-mcp.js) — exposes all coordination verbs as typed MCP tools, allowing agents to coordinate via MCP instead of shelling out to the CLI.

Both are thin adapters over the existing src/ coordination engine, so behavior is identical to the CLI — they're protocol frontends, not reimplementations.

Key Changes

Trial Harness (harness/)

  • Spec parser (src/spec.js) — deterministic, zero-LLM parsing of human-authored project specs (markdown) into structured task lists with validation (duplicate IDs, cycles, empty scopes all hard-fail).
  • Orchestrator (src/run.js) — builds isolated per-run workspaces (throwaway git repos with their own .tick/ state), seeds backlogs, spawns agent CLIs concurrently, captures transcripts, runs verification, and generates reports.
  • Driver abstraction (src/drivers.js) — unified interface for invoking Gemini, Codex, Claude, and mock agents headlessly (prompt on stdin, no chat UI).
  • Preflight phase (src/preflight.js) — agents ask clarifying questions before work begins; questions are collected and a human gate pauses the run for review.
  • Observability (src/observe.js) — structured JSONL spine + per-agent transcripts + final analysis report.
  • CLI (bin/trial) — list, doctor, validate, preflight, run subcommands with options for agent override, transport selection, timeouts, and circuit-breaker tuning.
  • Trial specs — three example projects (build-todo-api, build-url-shortener, debug-calc-bugs, debug-poisoned-task) demonstrating build, debug, and circuit-breaker scenarios.
  • Fixtures — seeded-bug code (calc-bugs, poison) for debug trials.
  • Test harness (test/smoke.sh) — validates the entire battery with the deterministic mock driver (no API keys needed).

MCP Server (mcp/tick-mcp.js)

  • Minimal JSON-RPC 2.0 over newline-delimited stdio (the MCP stdio transport).
  • Zero dependencies (per spike rules).
  • Implements initialize, tools/list, tools/call, ping.
  • Tools map 1:1 to CLI verbs: tick_init, tick_log, tick_project, tick_take, tick_next, tick_claim, tick_scope, tick_release, tick_break, tick_done, tick_reap, tick_analyze.
  • Smoke test (mcp/test/mcp-smoke.js) validates the protocol end-to-end.
  • Example .mcp.json wiring for Claude Code.

Integration

  • Agent prompts for both CLI (prompts/agent-loop.md) and MCP (prompts/agent-loop-mcp.md) modes.
  • Preflight prompt (prompts/preflight.md).
  • Prompt builder (src/prompts.js) with template rendering.
  • Process runner (src/proc.js) — spawn, feed stdin, capture stdout/stderr, enforce timeouts.

Notable Implementation Details

  • Isolation by design: Each trial run gets its own workspace with its own .tick/ state; the real repo is never touched. Trials are safe to run repeatedly and in parallel.
  • Deterministic mock driver (harness/src/mock-agent.js and test/fake-cli/) — stands in for real Gemini/Codex CLIs, exercises the full coordination protocol and harness observability without API keys.
  • Spec validation is strict — duplicate task IDs, empty scopes, non-numeric priority, dependency cycles, and

https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB

claude added 3 commits June 8, 2026 05:16
Replaces manual VS Code chat-panel coordination with a headless, observable,
CLI-driven trial runner for the coordination layer.

- bin/trial: list | doctor | validate | preflight | run
- driver abstraction (gemini --yolo, codex exec --full-auto, claude -p) + a
  deterministic mock driver so the battery runs with no API keys
- deterministic PROJECT-SPEC parser (ingestion stage 1) with validation
- preflight question round + human gate before any work
- isolated per-run workspace (own .tick/ + throwaway git repo)
- observability: run.jsonl spine, per-agent transcripts, report/SUMMARY.md
- battery: 2 build + 2 debug scenarios; test/smoke.sh green 11/11

https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
- mcp/tick-mcp.js: zero-dependency MCP stdio server exposing all 13 tick
  coordination verbs as typed tools, a drop-in alternative to the CLI (same
  src/ engine). Wiring docs + example .mcp.json + mcp-smoke.js (8/8).
- harness --transport cli|mcp: agents coordinate via ./tick or the MCP tools;
  MCP mode auto-writes a workspace .mcp.json bound to the run's isolated state,
  and uses an MCP-flavored agent prompt.
- harness/test/confirm-cli-orchestration.sh + test/fake-cli/: confirm the
  harness executes and monitors the real gemini/codex command shapes
  (gemini --yolo, codex exec --full-auto -) — prompt on stdin, identity from
  prompt, coordinate via tick, monitored to clean exit, path-routed with no
  collisions (7/7).

https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
- trial mcp-doctor: preflight health check for the tick MCP server (handshake +
  tools/list + non-mutating tick_analyze round-trip); throwaway state by
  default, --repo-root . to check real state read-only.
- .mcp.json: register the `tick` server at repo root so Claude Code can
  coordinate via tick_* tools directly.
- mcp/client.js: extract the shared MCP stdio JSON-RPC client (dedupes the
  smoke-test client).

Regression: mcp-smoke 8/8, battery 11/11, confirm-cli-orchestration 7/7.

https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants