Skip to content

dbc-oduffy/coordinator-claude

coordinator-claude

A PM-native operating layer for AI engineering work. An opinionated Claude Code plugin that turns product intent into scoped plans, delegated implementation, evidence, and ship/no-ship decisions. You're PM. Claude is EM. You stay technical enough to spot when something's wrong; the EM handles the rest.

Who This Is For

You don't need to write code. You need to know enough to evaluate it — to spot when something smells wrong, to ask the right questions, to set direction. Amjad Masad (Replit's founder) opined that people who don't program may actually be better positioned for the LLM-coding world: they won't micromanage implementation. This plugin leans into that. It gives Claude standing authority to make engineering decisions — how to build, who to delegate to, when to refactor — while you hold product authority: what to build, what to cut, what ships. Just like an EM-PM dynamic.

What this system asks of the PM is altitude, not tactics. Architectural judgment, scope decisions, refactor-vs-patch calls, what-ships-and-what-doesn't. Slop is fought on the plan level — by treating each plan as an exhaustive software design document with named-reviewer pushback baked in, not by spot-checking commits afterward. The natural model defaults — act fast and move on, defer P2s into oblivion, size work in human-sprint timeframes — are the things this system is built to push back against. The PM sets direction; the EM and the named reviewers do the pushing.

This isn't a collection of prompt tricks. It's a decision architecture: routines (session orientation, plan review, multi-source code review, structured handoffs, daily flows) plus authority boundaries (EM owns implementation, PM owns scope/ship) plus reviewer personas that push back with fix-gates between them. Reviews are meaningful — blocking findings, not theatre. And the system actively reads its own accumulated wikis and lessons before drafting new plans (prior-art-checker pre-flight, /learn-lessons triage), so the scar tissue from past decisions informs the next one rather than sitting in a write-only memory file. It maps to how PM-EM collaboration actually feels in a real engineering team. The difference is that your "team" can work autonomously for hours, and you can review the output when you're ready.

What This Is Not

  • Not another agent framework. Claude Code already has subagents, plugins, skills, hooks, and MCP. This sits on top of that infrastructure — it doesn't compete with it.
  • Not a PRD-to-code pipeline. Product intent is captured inside plan mode and acceptance criteria, not in a separate PRD funnel.
  • Not an autonomous coding agent. The PM is in the loop on scope, tradeoffs, and ship decisions by design. Autonomous modes (/mise-en-place, /autonomous) exist for execution sprints, not for product authority.
  • Not a developer productivity tool. The target user evaluates engineering work — they don't write it. If you want to be more productive at writing code yourself, you want vanilla Claude Code; this is for managing AI engineering work, not doing it.
  • Not aimed at the fully non-technical PM (yet). The current sweet spot is a PM with technical altitude but not technical hands — someone who has architectural judgment, recognizes unwise plan-shape, and can spot when a plan, an architectural choice, or a tradeoff sounds wrong from how the EM describes it, without ever reading a diff. The PM does not review commits, does not perform code review, does not sign off on PRs. That altitude is the load-bearing constraint: code review and PR sign-off are delegated to AI peers and named-persona staff-tier reviewers (see below). Complex codebases require a PM who can tell when something looks wrong from the conversation, not from the patch.

🤖 Agents: start here → AGENTS.md

Quick Start

You don't install this — your agent does. Open Claude Code in any project and paste:

Install coordinator-claude. The playbook is at
https://github.com/dbc-oduffy/coordinator-claude/blob/main/docs/agent-install.md
— read it, follow it, and queue /project-onboarding as the immediate next step
after I restart Claude Code.

Claude clones the repo, runs the installer, validates the result, and tells you when to restart. After restart, /project-onboarding bootstraps tracking infrastructure in your project.

Auditing & uninstalldocs/safety.md — what the installer changes, what it does not do, audit commands, and exact uninstall steps.

Compatibility

coordinator-claude Claude Code OS tested Notes
v2.0.0 tested with Claude Code 2026-05-07 release; minimum version not formally established macOS, Linux, WSL, Windows (Git Bash) Reference: setup/install.sh. Requires bash 4.0+ — macOS ships bash 3.2 as /bin/bash, so Mac users brew install bash and run the installer with it ("$(brew --prefix)/bin/bash" setup/install.sh); the installer fails fast with this guidance if run under 3.2. Linux/WSL/Git Bash ship bash 4+. v2.0.0 has breaking changes — see CHANGELOG.

How a Session Works

Most tools hand you a bag of commands and wish you luck. This system has routines — woven into the session lifecycle so you don't have to remember them. Some fire automatically via lifecycle hooks (boot orientation, the context-pressure nudge); the rest are one-keystroke ceremonies the EM runs at the right moment. The distinction matters: hooks are what actually fire on their own; the slash commands below are invoked — by you, or by the EM on your behalf.

Starting up. When Claude opens a supported project, a SessionStart hook fires automatically — loading the current branch, pending handoffs, lessons from past sessions, project vitals, and an orientation cache. No cold start. Claude lands in the middle of the context window where performance is strongest, with forward-looking state already loaded. This is deliberate: research shows LLMs degrade toward the end of their context and, to a lesser extent, at the beginning (Liu et al. 2023, "Lost in the Middle"). The orientation hook front-loads context so the working session occupies the optimal window.

Planning. You describe what you want. Claude enters plan mode — but the plan isn't just written and executed. You review it. In a real dev team, the PM doesn't just say "build auth" and disappear; they review the spec, push back on scope, ask about edge cases. That's what happens here. The coordinator:plan skill is a decision-tree super-skill (triage → substrate → compose → exit) that mechanically binds the trigger word "plan" to the full pipeline: the prior-art-checker pre-flight cross-references the plan against accumulated wikis, lessons, and improvement queues to surface conflicts before an Opus reviewer touches it; the Staff Engineer then reviews; the integrator applies findings. For bigger decisions, /staff-session spawns role-based engineers who independently develop positions and debate to consensus — like pulling your tech lead and director of engineering into a room.

Building. Claude delegates to Sonnet subagents for implementation — cheaper, faster, fresh context. A PreToolUse hook nudges Claude away from doing implementation work directly, because the orchestrator's context is too valuable to spend on writing code. This is the same principle as a real EM: you don't want your engineering manager writing production code when they should be coordinating.

Reviewing. Code review comes from named personas with rich behavioral profiles — a domain expert reviews first (e.g., a web-dev specialist for front-end work, or a data-science specialist for ML pipelines), all findings are applied, then a generalist reviews the clean artifact. Sequential, with mandatory fix gates. Research supports both the persona mechanism (literature-backed for judgment-routing tasks; for mechanical bug-finding our own controlled experiment found no recall gain — that's why bug-sweeps use bare agents) and multi-agent review gains (Anthropic's own eval showed 90.2% improvement over single-agent). Plan-review with personas — the system's main use of named reviewers — leans on industry-standard PRD/SDD review patterns; we have not separately benchmarked it.

Staying coherent. Long sessions hit a hard constraint: context compaction. When triggered, the model summarizes what it thinks happened — a retrospective reconstruction that loses intent. A PostToolUse hook monitors context pressure and prompts Claude to create a structured handoff before compaction fires: decisions made, state reached, explicit next steps. Each handoff chains forward from its predecessor. Research shows structured handoffs significantly outperform automatic summarization (Kang et al. 2025, ACON; Sourcegraph retired compaction in their Amp agent in favor of explicit handoffs after measuring degradation).

Navigating the codebase. This system invests heavily in documentation-as-navigation. Claude's natural mode is grep-heavy — searching text, reading prose, following paper trails. An architecture atlas, project tracker, orientation caches, and structured comments throughout the codebase give Claude something to find when it searches. We call this "grep bait." It's why the doc maintenance pipeline and architecture atlas exist: not bureaucracy, but navigation infrastructure that lets Claude plan from 60 lines of orientation instead of reading 20 source files cold. Research artifacts, lessons files, and handoff documents all serve double duty — they record decisions and create searchable landmarks for future sessions.

Wrapping up. /session-end captures lessons, updates documentation, and commits state. /workday-complete goes further: syncs docs, merges to main via PR, and optionally hibernates the machine. The cycle is continuous — each session starts where the last one left off.

What You Need to Remember

The EM handles most of this on its own. Your key moves:

Command When What It Does
/session-start Beginning of work Deliberate orientation — triage handoffs, surface staleness, choose work. PM-invoked; not auto-fired. (Boot context loads automatically via a separate SessionStart hook; this command is the optional deeper orient.)
/pickup Resuming work Load a handoff artifact and continue where you left off
/handoff Stepping away Save session state for the next session
/staff-session Big decisions Multi-perspective planning or review from persona-based contributors
/mise-en-place Heads-down time Claude burns through the backlog autonomously — no input needed
/autonomous Override Suppress handoff nudges when you want Claude to push through compaction

That's it for daily use. Everything else — delegation, review routing, doc maintenance — the EM drives as part of its workflow; context-pressure management is the one piece handled automatically, by a PostToolUse hook. You don't have to ask for any of it.

Flows

Don't memorize commands; learn five flows. Most of what the system does, you'll touch through one of these.

Flow 1 — Build a feature. You describe intent → Claude enters plan mode and proposes acceptance criteria + scope mode → you review and approve → Claude delegates implementation → reviewers (domain expert first, generalist second) check the artifact with fix gates between → for user-visible work or patches that smell like they should be refactors, the VP-Product Reviewer (coordinator:vp-product) — scope challenger, naming optional via setup/name-personas.sh — stress-tests the choice → /merging-to-main produces a ship verdict and you decide.

Flow 2 — Fix a bug. Reproduction first (don't trust the report) → root cause via the systematic-debugging guide → scoped fix in production-patch mode (minimal diff, no opportunistic refactors) → regression check → reviewer → merge. For codebase-wide grinds, /bug-blitz autonomously works through the bug backlog with EM-serial commits at each wave gate.

Flow 3 — Resume work. The boot SessionStart hook automatically loads orientation, lessons, and pending handoffs into context → optionally run /session-start for a deeper triage → you pick up via /pickup <handoff> or pick from the menu → Claude lands mid-context and starts where the last session stopped.

Flow 4 — Autonomous sprint. /mise-en-place gathers ready work, builds a compaction-proof flight recorder, and executes through the backlog without stopping. /autonomous suppresses handoff nudges when you want it to push through context pressure.

Flow 5 — Architecture change. /staff-session plan (multi-perspective debate) → migration plan with rollback → architecture mode review → implementation → verification → architecture atlas update via /update-docs.

Closing the day: /workday-complete validates, syncs, runs the daily review, and merges. /workweek-complete is the larger weekly ceremony with version bump and release notes.

How heavy is the workflow?

The system scales — a typo fix is a two-word instruction; a system rewrite is a multi-agent pipeline. Here are three representative tiers:

Tier Skill / command Reviewer Wall time
Tiny edit (typo, constant, rename) Direct EM edit — no plan None required < 5 min
Feature (new command, new skill) /execute-plan after PM approves a plan Domain reviewer → the Staff Engineer (coordinator:staff-eng) generalist; the Director of Engineering (coordinator:eng-director) as backstop at High effort (sequential) 30 min – 2 hrs
System rewrite (multi-plugin overhaul) /staff-session plan/execute-plan (with executor dispatch per docs/wiki/delegate-execution.md) Full sequential chain + PM ship verdict Half day+

See docs/wiki/task-tier-guidance.md for the full tier table, reviewer routing guide, and flow diagrams.

All Commands (appendix)
Command Purpose Why It Exists
/session-start Orient to project, load context, choose work Eliminate cold starts; position key context optimally
/session-end Capture lessons, update docs, commit Preserve institutional knowledge between sessions
/pickup Resume from a handoff artifact Structured continuity beats re-reading git log
/handoff Save state before stepping away Prospective capture > retrospective reconstruction
/staff-session Multi-perspective planning or review Debate surfaces tradeoffs a solo agent misses
/mise-en-place Autonomous backlog execution Front-load all context, then execute without interruption
/autonomous Toggle autonomous mode Trust escalation — suppress handoff nudges
/workday-start Morning orientation and triage Surface staleness, align priorities, suggest work
/workday-complete End-of-day wrap-up Daily housekeeping: validate, consolidate, append week-changelog
/workweek-start Strategic weekly orient Surface last week's results, set this week's priorities
/workweek-complete Release-grade weekly close Full validation, version bump, parallel code review, merge to main
/review Review a plan / design / RFC Domain expert → generalist, sequential with fix gates
/review-code Review a code change / diff / PR Same reviewer pipeline, code-shaped artifacts
/execute-plan Execute a PM-approved plan Direct implementation without re-planning
/update-docs Repo-wide documentation maintenance 11-phase pipeline that fights doc staleness
/bug-sweep Systematic codebase bug hunt Find and fix all AI-fixable bugs in-session
/bug-blitz Autonomous bug-backlog grinder Wave-based execution with EM-serial commits at each gate
/dogfood Smoke-driven fix-through loop Binary outcome: converge on a working capability or switch gears
/learn-lessons Triage tasks/lessons.md as doctrine change-requests Three modes: local (per-repo), central (cross-repo), recheck (deferred follow-ups)
/code-health Night-shift code health review Scan today's commits, dispatch reviewer, apply findings
/architecture-audit Deep architecture analysis Multi-phase agent pipeline for system health
/distill Extract knowledge from session artifacts Turn plans and handoffs into evergreen wiki docs
How this maps to real teams
Real Team Practice coordinator-claude Equivalent
Daily standup /session-start — what happened, what's blocked, what's next
Sprint planning /staff-session plan — persona-based engineers debate approach
Spec review Plan mode with PM sign-off — Claude proposes, you approve
Code review /review-code — domain expert first, generalist second, fix gates between
Tech lead gut-check Any reviewer can be dispatched ad-hoc for a quick assessment
Retrospective /session-end — capture lessons, update docs
Heads-down sprint /mise-en-place — autonomous execution through the backlog
Handoff between shifts /handoff — structured state capture, not "check the git log"

The one role we don't have deeply embedded in workflows: designer. Meatspace designers are still better at that, and their judgment is going to remain difficult for LLMs to replicate. There's a vibe-design functionality, but it's not gonna rock your world.

Under the hood — architecture details

Inverted capability delegation. The coordinator sees ~8 thin MCP tools; domain agents access 40+ via proxy with full schemas. This saves ~40K tokens from the coordinator's context and forces delegation by design. A PreToolUse hook nudges the coordinator when it reaches for domain tools directly.

Proactive artifact generation. Before compaction fires, a hook prompts structured handoff creation — a prospective document capturing decisions, state, and next steps. Each artifact chains from its predecessor (cascade obligation) and opens with a synthesis of the prior handoff (anti-amnesia chain). See our handoff vs. compaction research.

Role-based sequential review. Reviewers carry rich behavioral profiles — not just "code reviewer" but distinct roles with expertise domains and review lenses. Sequential review with mandatory fix gates means each reviewer sees a clean artifact. See the persona research and experiment results. Role labels ship as defaults; an optional naming flow lets users bind personal names to roles if that aids their cognitive ease.

6-layer project knowledge. Structure, architecture, activity, temporal, intent, state — none bulk-loaded. A tiered context model loads a ~60-line orientation cache at L1, pulls detailed artifacts on demand at L2, and reserves L3 for deep storage read by subagents. An 11-phase maintenance pipeline fights doc staleness automatically.

Agent Teams for planning. Claude Code's Agent Teams enables multiple Claude sessions that communicate and coordinate. This system uses it for multi-perspective planning: persona-based debaters form independent positions, challenge each other, and a synthesizer cross-references into consensus. Also powers the bundled deep-research pipelines (internet, repo, structured, NotebookLM).

Cross-model delegation. Haiku for mechanical checks, Sonnet for most execution, Opus for judgment and synthesis. Codex CLI integration is available as an opt-in add-on (setup/install.sh --enable-codex plus the external openai/codex-plugin-cc plugin) for a second-opinion channel and independent implementation path; default installs omit it.

See docs/architecture.md for the full model. For the testable claims — what each agent role promises and which hook or script enforces each promise — see docs/contracts.md. For broader context, the novelty research assesses all patterns against published prior art.

For the failure modes this system was designed around — false completion, silent scope expansion, test theater, review laundering, context amnesia, integration blindness, and a dozen others — see docs/evolution/05-failure-modes.md.

For evidence — what was actually built under this workflow, alongside the controlled persona experiment and the qualitative failure-modes ledger — see docs/research/2026-05-08-built-with-coordinator.md, docs/research/2026-03-26-persona-experiment-results.md, and docs/evolution/05-failure-modes.md. These are the three peer artifacts in the evidence corpus.

Plugins

Plugin Purpose When to Enable
coordinator Core orchestration, reviewers, all workflow skills Always
deep-research Multi-agent research pipelines — internet (A), repo (B), structured (C) Any project that needs grounded research
notebooklm NotebookLM-backed research pipeline (D) — YouTube, podcasts, audio sources When you need media Claude can't read directly
web-dev Front-end architecture review + UX flow review Web projects
data-science ML, statistics, data modeling review ML/data work

The coordinator plugin is always enabled. Domain plugins are toggled per-project via .claude/coordinator.local.md.

Customization

  • Name your reviewers (optional). Role labels ship as the default — bash setup/name-personas.sh "the Staff Engineer" "Alex" "the Director of Engineering" "Jordan" binds chosen names to role labels across all plugin files. See the role table in docs/customization.md for all seven roles and their slugs.
  • Create your own domain reviewer. The shipped domain plugins are the reference pattern: web-dev ships a Front-end Reviewer and a UX Reviewer, data-science ships a Data Science Reviewer (ML/statistics). Each is a self-contained plugin with an agent prompt, a CLAUDE.md, and a .claude-plugin/plugin.json — copy any of them as a starting point for your own specialization.
  • Per-project configuration. Create .claude/coordinator.local.md with project_type to control which reviewers activate.

See docs/customization.md for templates, the full persona registry, and instructions for adding skills and CI checks.

Companion Plugins

  • clangd-lsp — C++ code intelligence. Reviewer agents gain go-to-definition, find-references, and call hierarchy ‒ helpful for those (like us) using Claude Code with Unreal Engine.
  • codex-plugin-cc — Codex CLI integration for parallel execution and second-opinion reviews.
  • Context7 — External library documentation lookup.

All are optional. Coordinator works without them; relevant features degrade gracefully.

Directory structure
coordinator-claude/
├── plugins/
│   ├── coordinator/            # Core orchestration (always enabled)
│   │   ├── .claude-plugin/plugin.json
│   │   ├── agents/             # 18 — enricher, executor, docs-checker, reviewers, eng-director, vp-product, reviewer-routed workers
│   │   ├── commands/           # 12 workflow commands (hook/ceremony auto-runners)
│   │   ├── hooks/              # context pressure, orientation, commit validation, tier-usage telemetry
│   │   ├── pipelines/          # staff-session team protocol + prompt templates
│   │   └── skills/             # 29 skills (planning, review, debugging, TDD, etc.)
│   ├── deep-research/          # Pipelines A/B/C + 6 research agents
│   │   └── notebooklm/         # Pipeline D (media research via NotebookLM)
│   ├── web-dev/                # Front-end + UX flow reviewers
│   └── data-science/           # ML, statistics reviewer (the Data Science Reviewer)
├── docs/                       # Architecture, customization, research
├── setup/                      # Installer
└── assets/                     # Social preview
Troubleshooting

Plugins not loading:

  • Check enabledPlugins in ~/.claude/settings.json — must be true
  • Check ~/.claude/plugins/installed_plugins.json — must have entry with correct installPath
  • Restart Claude Code (changes take effect on next session)
  • The installer (setup/install.sh) manages all config files automatically

Plugin cache not syncing after editing source:

  • Claude Code caches plugins by version. Run bash setup/dev-sync.sh to sync.

Per-project plugin selection:

  • Create .claude/coordinator.local.md with project_type field
  • Coordinator is always enabled; domain plugins activate per-project

Research

This system's design is informed by published research and validated through controlled experiments:


the PM O'Duffy & Claude