Add trial harness and MCP server for automated multi-agent coordination by noelsaw1 · Pull Request #45 · Hypercart-Dev-Tools/AI-DDTK-Fix-Iterate-Loop

noelsaw1 · 2026-06-08T05:35:04Z

Summary

This PR adds two major new capabilities to the Trinity coordination layer:

Trial harness (experiments/coordination-layer/harness/) — a complete system for running automated, repeatable multi-agent coordination trials from the CLI, replacing manual chat-panel babysitting with headless agent execution.
MCP server (experiments/coordination-layer/mcp/tick-mcp.js) — exposes all coordination verbs as typed MCP tools, allowing agents to coordinate via MCP instead of shelling out to the CLI.

Both are thin adapters over the existing src/ coordination engine, so behavior is identical to the CLI — they're protocol frontends, not reimplementations.

Key Changes

Trial Harness (`harness/`)

Spec parser (src/spec.js) — deterministic, zero-LLM parsing of human-authored project specs (markdown) into structured task lists with validation (duplicate IDs, cycles, empty scopes all hard-fail).
Orchestrator (src/run.js) — builds isolated per-run workspaces (throwaway git repos with their own .tick/ state), seeds backlogs, spawns agent CLIs concurrently, captures transcripts, runs verification, and generates reports.
Driver abstraction (src/drivers.js) — unified interface for invoking Gemini, Codex, Claude, and mock agents headlessly (prompt on stdin, no chat UI).
Preflight phase (src/preflight.js) — agents ask clarifying questions before work begins; questions are collected and a human gate pauses the run for review.
Observability (src/observe.js) — structured JSONL spine + per-agent transcripts + final analysis report.
CLI (bin/trial) — list, doctor, validate, preflight, run subcommands with options for agent override, transport selection, timeouts, and circuit-breaker tuning.
Trial specs — three example projects (build-todo-api, build-url-shortener, debug-calc-bugs, debug-poisoned-task) demonstrating build, debug, and circuit-breaker scenarios.
Fixtures — seeded-bug code (calc-bugs, poison) for debug trials.
Test harness (test/smoke.sh) — validates the entire battery with the deterministic mock driver (no API keys needed).

MCP Server (`mcp/tick-mcp.js`)

Minimal JSON-RPC 2.0 over newline-delimited stdio (the MCP stdio transport).
Zero dependencies (per spike rules).
Implements initialize, tools/list, tools/call, ping.
Tools map 1:1 to CLI verbs: tick_init, tick_log, tick_project, tick_take, tick_next, tick_claim, tick_scope, tick_release, tick_break, tick_done, tick_reap, tick_analyze.
Smoke test (mcp/test/mcp-smoke.js) validates the protocol end-to-end.
Example .mcp.json wiring for Claude Code.

Integration

Agent prompts for both CLI (prompts/agent-loop.md) and MCP (prompts/agent-loop-mcp.md) modes.
Preflight prompt (prompts/preflight.md).
Prompt builder (src/prompts.js) with template rendering.
Process runner (src/proc.js) — spawn, feed stdin, capture stdout/stderr, enforce timeouts.

Notable Implementation Details

Isolation by design: Each trial run gets its own workspace with its own .tick/ state; the real repo is never touched. Trials are safe to run repeatedly and in parallel.
Deterministic mock driver (harness/src/mock-agent.js and test/fake-cli/) — stands in for real Gemini/Codex CLIs, exercises the full coordination protocol and harness observability without API keys.
Spec validation is strict — duplicate task IDs, empty scopes, non-numeric priority, dependency cycles, and

https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB

Replaces manual VS Code chat-panel coordination with a headless, observable, CLI-driven trial runner for the coordination layer. - bin/trial: list | doctor | validate | preflight | run - driver abstraction (gemini --yolo, codex exec --full-auto, claude -p) + a deterministic mock driver so the battery runs with no API keys - deterministic PROJECT-SPEC parser (ingestion stage 1) with validation - preflight question round + human gate before any work - isolated per-run workspace (own .tick/ + throwaway git repo) - observability: run.jsonl spine, per-agent transcripts, report/SUMMARY.md - battery: 2 build + 2 debug scenarios; test/smoke.sh green 11/11 https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB

- mcp/tick-mcp.js: zero-dependency MCP stdio server exposing all 13 tick coordination verbs as typed tools, a drop-in alternative to the CLI (same src/ engine). Wiring docs + example .mcp.json + mcp-smoke.js (8/8). - harness --transport cli|mcp: agents coordinate via ./tick or the MCP tools; MCP mode auto-writes a workspace .mcp.json bound to the run's isolated state, and uses an MCP-flavored agent prompt. - harness/test/confirm-cli-orchestration.sh + test/fake-cli/: confirm the harness executes and monitors the real gemini/codex command shapes (gemini --yolo, codex exec --full-auto -) — prompt on stdin, identity from prompt, coordinate via tick, monitored to clean exit, path-routed with no collisions (7/7). https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB

- trial mcp-doctor: preflight health check for the tick MCP server (handshake + tools/list + non-mutating tick_analyze round-trip); throwaway state by default, --repo-root . to check real state read-only. - .mcp.json: register the `tick` server at repo root so Claude Code can coordinate via tick_* tools directly. - mcp/client.js: extract the shared MCP stdio JSON-RPC client (dedupes the smoke-test client). Regression: mcp-smoke 8/8, battery 11/11, confirm-cli-orchestration 7/7. https://claude.ai/code/session_01WnzAdCRGrrhukvW1etFLyB

claude added 3 commits June 8, 2026 05:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trial harness and MCP server for automated multi-agent coordination#45

Add trial harness and MCP server for automated multi-agent coordination#45
noelsaw1 wants to merge 3 commits into
experiment/coordination-layerfrom
claude/hopeful-noether-9xJXI

noelsaw1 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

noelsaw1 commented Jun 8, 2026

Summary

Key Changes

Trial Harness (harness/)

MCP Server (mcp/tick-mcp.js)

Integration

Notable Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Trial Harness (`harness/`)

MCP Server (`mcp/tick-mcp.js`)