coordinator-claude

A PM-native operating layer for AI engineering work. An opinionated Claude Code plugin that turns product intent into scoped plans, delegated implementation, evidence, and ship/no-ship decisions. You're PM. Claude is EM. You stay technical enough to spot when something's wrong; the EM handles the rest.

Who This Is For

You don't need to write code. You need to know enough to evaluate it — to spot when something smells wrong, to ask the right questions, to set direction. Amjad Masad (Replit's founder) opined that people who don't program may actually be better positioned for the LLM-coding world: they won't micromanage implementation. This plugin leans into that. It gives Claude standing authority to make engineering decisions — how to build, who to delegate to, when to refactor — while you hold product authority: what to build, what to cut, what ships. Just like an EM-PM dynamic.

What this system asks of the PM is altitude, not tactics. Architectural judgment, scope decisions, refactor-vs-patch calls, what-ships-and-what-doesn't. Slop is fought on the plan level — by treating each plan as an exhaustive software design document with named-reviewer pushback baked in, not by spot-checking commits afterward. The natural model defaults — act fast and move on, defer P2s into oblivion, size work in human-sprint timeframes — are the things this system is built to push back against. The PM sets direction; the EM and the named reviewers do the pushing.

This isn't a collection of prompt tricks. It's a decision architecture: routines (session orientation, plan review, multi-source code review, structured handoffs, daily flows) plus authority boundaries (EM owns implementation, PM owns scope/ship) plus reviewer personas that push back with fix-gates between them. Reviews are meaningful — blocking findings, not theatre. And the system actively reads its own accumulated wikis and lessons before drafting new plans (prior-art-checker pre-flight, /learn-lessons triage), so the scar tissue from past decisions informs the next one rather than sitting in a write-only memory file. It maps to how PM-EM collaboration actually feels in a real engineering team. The difference is that your "team" can work autonomously for hours, and you can review the output when you're ready.

What This Is Not

Not another agent framework. Claude Code already has subagents, plugins, skills, hooks, and MCP. This sits on top of that infrastructure — it doesn't compete with it.
Not a PRD-to-code pipeline. Product intent is captured inside plan mode and acceptance criteria, not in a separate PRD funnel.
Not an autonomous coding agent. The PM is in the loop on scope, tradeoffs, and ship decisions by design. Autonomous modes (/mise-en-place, /autonomous) exist for execution sprints, not for product authority.
Not a developer productivity tool. The target user evaluates engineering work — they don't write it. If you want to be more productive at writing code yourself, you want vanilla Claude Code; this is for managing AI engineering work, not doing it.
Not aimed at the fully non-technical PM (yet). The current sweet spot is a PM with technical altitude but not technical hands — someone who has architectural judgment, recognizes unwise plan-shape, and can spot when a plan, an architectural choice, or a tradeoff sounds wrong from how the EM describes it, without ever reading a diff. The PM does not review commits, does not perform code review, does not sign off on PRs. That altitude is the load-bearing constraint: code review and PR sign-off are delegated to AI peers and named-persona staff-tier reviewers (see below). Complex codebases require a PM who can tell when something looks wrong from the conversation, not from the patch.

🤖 Agents: start here → AGENTS.md

Quick Start

You don't install this — your agent does. Open Claude Code in any project and paste:

Install coordinator-claude. The playbook is at
https://github.com/dbc-oduffy/coordinator-claude/blob/main/docs/agent-install.md
— read it, follow it, and queue /project-onboarding as the immediate next step
after I restart Claude Code.

Claude clones the repo, runs the installer, validates the result, and tells you when to restart. After restart, /project-onboarding bootstraps tracking infrastructure in your project.

Auditing & uninstall → docs/safety.md — what the installer changes, what it does not do, audit commands, and exact uninstall steps.

Compatibility

coordinator-claude	Claude Code	OS tested	Notes
v2.0.0	tested with Claude Code 2026-05-07 release; minimum version not formally established	macOS, Linux, WSL, Windows (Git Bash)	Reference: `setup/install.sh`. Requires bash 4.0+ — macOS ships bash 3.2 as `/bin/bash`, so Mac users `brew install bash` and run the installer with it (`"$(brew --prefix)/bin/bash" setup/install.sh`); the installer fails fast with this guidance if run under 3.2. Linux/WSL/Git Bash ship bash 4+. v2.0.0 has breaking changes — see CHANGELOG.

How a Session Works

Most tools hand you a bag of commands and wish you luck. This system has routines — woven into the session lifecycle so you don't have to remember them. Some fire automatically via lifecycle hooks (boot orientation, the context-pressure nudge); the rest are one-keystroke ceremonies the EM runs at the right moment. The distinction matters: hooks are what actually fire on their own; the slash commands below are invoked — by you, or by the EM on your behalf.

Starting up. When Claude opens a supported project, a SessionStart hook fires automatically — loading the current branch, pending handoffs, lessons from past sessions, project vitals, and an orientation cache. No cold start. Claude lands in the middle of the context window where performance is strongest, with forward-looking state already loaded. This is deliberate: research shows LLMs degrade toward the end of their context and, to a lesser extent, at the beginning (Liu et al. 2023, "Lost in the Middle"). The orientation hook front-loads context so the working session occupies the optimal window.

Planning. You describe what you want. Claude enters plan mode — but the plan isn't just written and executed. You review it. In a real dev team, the PM doesn't just say "build auth" and disappear; they review the spec, push back on scope, ask about edge cases. That's what happens here. The coordinator:plan skill is a decision-tree super-skill (triage → substrate → compose → exit) that mechanically binds the trigger word "plan" to the full pipeline: the prior-art-checker pre-flight cross-references the plan against accumulated wikis, lessons, and improvement queues to surface conflicts before an Opus reviewer touches it; the Staff Engineer then reviews; the integrator applies findings. For bigger decisions, /staff-session spawns role-based engineers who independently develop positions and debate to consensus — like pulling your tech lead and director of engineering into a room.

Building. Claude delegates to Sonnet subagents for implementation — cheaper, faster, fresh context. A PreToolUse hook nudges Claude away from doing implementation work directly, because the orchestrator's context is too valuable to spend on writing code. This is the same principle as a real EM: you don't want your engineering manager writing production code when they should be coordinating.

Reviewing. Code review comes from named personas with rich behavioral profiles — a domain expert reviews first (e.g., a web-dev specialist for front-end work, or a data-science specialist for ML pipelines), all findings are applied, then a generalist reviews the clean artifact. Sequential, with mandatory fix gates. Research supports both the persona mechanism (literature-backed for judgment-routing tasks; for mechanical bug-finding our own controlled experiment found no recall gain — that's why bug-sweeps use bare agents) and multi-agent review gains (Anthropic's own eval showed 90.2% improvement over single-agent). Plan-review with personas — the system's main use of named reviewers — leans on industry-standard PRD/SDD review patterns; we have not separately benchmarked it.

Staying coherent. Long sessions hit a hard constraint: context compaction. When triggered, the model summarizes what it thinks happened — a retrospective reconstruction that loses intent. A PostToolUse hook monitors context pressure and prompts Claude to create a structured handoff before compaction fires: decisions made, state reached, explicit next steps. Each handoff chains forward from its predecessor. Research shows structured handoffs significantly outperform automatic summarization (Kang et al. 2025, ACON; Sourcegraph retired compaction in their Amp agent in favor of explicit handoffs after measuring degradation).

Navigating the codebase. This system invests heavily in documentation-as-navigation. Claude's natural mode is grep-heavy — searching text, reading prose, following paper trails. An architecture atlas, project tracker, orientation caches, and structured comments throughout the codebase give Claude something to find when it searches. We call this "grep bait." It's why the doc maintenance pipeline and architecture atlas exist: not bureaucracy, but navigation infrastructure that lets Claude plan from 60 lines of orientation instead of reading 20 source files cold. Research artifacts, lessons files, and handoff documents all serve double duty — they record decisions and create searchable landmarks for future sessions.

Wrapping up. /session-end captures lessons, updates documentation, and commits state. /workday-complete goes further: syncs docs, merges to main via PR, and optionally hibernates the machine. The cycle is continuous — each session starts where the last one left off.

What You Need to Remember

The EM handles most of this on its own. Your key moves:

Command	When	What It Does
`/session-start`	Beginning of work	Deliberate orientation — triage handoffs, surface staleness, choose work. PM-invoked; not auto-fired. (Boot context loads automatically via a separate `SessionStart` hook; this command is the optional deeper orient.)
`/pickup`	Resuming work	Load a handoff artifact and continue where you left off
`/handoff`	Stepping away	Save session state for the next session
`/staff-session`	Big decisions	Multi-perspective planning or review from persona-based contributors
`/mise-en-place`	Heads-down time	Claude burns through the backlog autonomously — no input needed
`/autonomous`	Override	Suppress handoff nudges when you want Claude to push through compaction

That's it for daily use. Everything else — delegation, review routing, doc maintenance — the EM drives as part of its workflow; context-pressure management is the one piece handled automatically, by a PostToolUse hook. You don't have to ask for any of it.

Flows

Don't memorize commands; learn five flows. Most of what the system does, you'll touch through one of these.

Flow 1 — Build a feature. You describe intent → Claude enters plan mode and proposes acceptance criteria + scope mode → you review and approve → Claude delegates implementation → reviewers (domain expert first, generalist second) check the artifact with fix gates between → for user-visible work or patches that smell like they should be refactors, the VP-Product Reviewer (coordinator:vp-product) — scope challenger, naming optional via setup/name-personas.sh — stress-tests the choice → /merging-to-main produces a ship verdict and you decide.

Flow 2 — Fix a bug. Reproduction first (don't trust the report) → root cause via the systematic-debugging guide → scoped fix in production-patch mode (minimal diff, no opportunistic refactors) → regression check → reviewer → merge. For codebase-wide grinds, /bug-blitz autonomously works through the bug backlog with EM-serial commits at each wave gate.

Flow 3 — Resume work. The boot SessionStart hook automatically loads orientation, lessons, and pending handoffs into context → optionally run /session-start for a deeper triage → you pick up via /pickup <handoff> or pick from the menu → Claude lands mid-context and starts where the last session stopped.

Flow 4 — Autonomous sprint. /mise-en-place gathers ready work, builds a compaction-proof flight recorder, and executes through the backlog without stopping. /autonomous suppresses handoff nudges when you want it to push through context pressure.

Flow 5 — Architecture change. /staff-session plan (multi-perspective debate) → migration plan with rollback → architecture mode review → implementation → verification → architecture atlas update via /update-docs.

Closing the day: /workday-complete validates, syncs, runs the daily review, and merges. /workweek-complete is the larger weekly ceremony with version bump and release notes.

How heavy is the workflow?

The system scales — a typo fix is a two-word instruction; a system rewrite is a multi-agent pipeline. Here are three representative tiers:

Tier	Skill / command	Reviewer	Wall time
Tiny edit (typo, constant, rename)	Direct EM edit — no plan	None required	< 5 min
Feature (new command, new skill)	`/execute-plan` after PM approves a plan	Domain reviewer → the Staff Engineer (`coordinator:staff-eng`) generalist; the Director of Engineering (`coordinator:eng-director`) as backstop at High effort (sequential)	30 min – 2 hrs
System rewrite (multi-plugin overhaul)	`/staff-session plan` → `/execute-plan` (with executor dispatch per `docs/wiki/delegate-execution.md`)	Full sequential chain + PM ship verdict	Half day+

See docs/wiki/task-tier-guidance.md for the full tier table, reviewer routing guide, and flow diagrams.

All Commands (appendix)

Command	Purpose	Why It Exists
`/session-start`	Orient to project, load context, choose work	Eliminate cold starts; position key context optimally
`/session-end`	Capture lessons, update docs, commit	Preserve institutional knowledge between sessions
`/pickup`	Resume from a handoff artifact	Structured continuity beats re-reading git log
`/handoff`	Save state before stepping away	Prospective capture > retrospective reconstruction
`/staff-session`	Multi-perspective planning or review	Debate surfaces tradeoffs a solo agent misses
`/mise-en-place`	Autonomous backlog execution	Front-load all context, then execute without interruption
`/autonomous`	Toggle autonomous mode	Trust escalation — suppress handoff nudges
`/workday-start`	Morning orientation and triage	Surface staleness, align priorities, suggest work
`/workday-complete`	End-of-day wrap-up	Daily housekeeping: validate, consolidate, append week-changelog
`/workweek-start`	Strategic weekly orient	Surface last week's results, set this week's priorities
`/workweek-complete`	Release-grade weekly close	Full validation, version bump, parallel code review, merge to main
`/review`	Review a plan / design / RFC	Domain expert → generalist, sequential with fix gates
`/review-code`	Review a code change / diff / PR	Same reviewer pipeline, code-shaped artifacts
`/execute-plan`	Execute a PM-approved plan	Direct implementation without re-planning
`/update-docs`	Repo-wide documentation maintenance	11-phase pipeline that fights doc staleness
`/bug-sweep`	Systematic codebase bug hunt	Find and fix all AI-fixable bugs in-session
`/bug-blitz`	Autonomous bug-backlog grinder	Wave-based execution with EM-serial commits at each gate
`/dogfood`	Smoke-driven fix-through loop	Binary outcome: converge on a working capability or switch gears
`/learn-lessons`	Triage `tasks/lessons.md` as doctrine change-requests	Three modes: local (per-repo), central (cross-repo), recheck (deferred follow-ups)
`/code-health`	Night-shift code health review	Scan today's commits, dispatch reviewer, apply findings
`/architecture-audit`	Deep architecture analysis	Multi-phase agent pipeline for system health
`/distill`	Extract knowledge from session artifacts	Turn plans and handoffs into evergreen wiki docs

How this maps to real teams

Real Team Practice	coordinator-claude Equivalent
Daily standup	`/session-start` — what happened, what's blocked, what's next
Sprint planning	`/staff-session plan` — persona-based engineers debate approach
Spec review	Plan mode with PM sign-off — Claude proposes, you approve
Code review	`/review-code` — domain expert first, generalist second, fix gates between
Tech lead gut-check	Any reviewer can be dispatched ad-hoc for a quick assessment
Retrospective	`/session-end` — capture lessons, update docs
Heads-down sprint	`/mise-en-place` — autonomous execution through the backlog
Handoff between shifts	`/handoff` — structured state capture, not "check the git log"

The one role we don't have deeply embedded in workflows: designer. Meatspace designers are still better at that, and their judgment is going to remain difficult for LLMs to replicate. There's a vibe-design functionality, but it's not gonna rock your world.

Under the hood — architecture details

Inverted capability delegation. The coordinator sees ~8 thin MCP tools; domain agents access 40+ via proxy with full schemas. This saves ~40K tokens from the coordinator's context and forces delegation by design. A PreToolUse hook nudges the coordinator when it reaches for domain tools directly.

Proactive artifact generation. Before compaction fires, a hook prompts structured handoff creation — a prospective document capturing decisions, state, and next steps. Each artifact chains from its predecessor (cascade obligation) and opens with a synthesis of the prior handoff (anti-amnesia chain). See our handoff vs. compaction research.

Role-based sequential review. Reviewers carry rich behavioral profiles — not just "code reviewer" but distinct roles with expertise domains and review lenses. Sequential review with mandatory fix gates means each reviewer sees a clean artifact. See the persona research and experiment results. Role labels ship as defaults; an optional naming flow lets users bind personal names to roles if that aids their cognitive ease.

6-layer project knowledge. Structure, architecture, activity, temporal, intent, state — none bulk-loaded. A tiered context model loads a ~60-line orientation cache at L1, pulls detailed artifacts on demand at L2, and reserves L3 for deep storage read by subagents. An 11-phase maintenance pipeline fights doc staleness automatically.

Agent Teams for planning. Claude Code's Agent Teams enables multiple Claude sessions that communicate and coordinate. This system uses it for multi-perspective planning: persona-based debaters form independent positions, challenge each other, and a synthesizer cross-references into consensus. Also powers the bundled deep-research pipelines (internet, repo, structured, NotebookLM).

Cross-model delegation. Haiku for mechanical checks, Sonnet for most execution, Opus for judgment and synthesis. Codex CLI integration is available as an opt-in add-on (setup/install.sh --enable-codex plus the external openai/codex-plugin-cc plugin) for a second-opinion channel and independent implementation path; default installs omit it.

See docs/architecture.md for the full model. For the testable claims — what each agent role promises and which hook or script enforces each promise — see docs/contracts.md. For broader context, the novelty research assesses all patterns against published prior art.

For the failure modes this system was designed around — false completion, silent scope expansion, test theater, review laundering, context amnesia, integration blindness, and a dozen others — see docs/evolution/05-failure-modes.md.

For evidence — what was actually built under this workflow, alongside the controlled persona experiment and the qualitative failure-modes ledger — see docs/research/2026-05-08-built-with-coordinator.md, docs/research/2026-03-26-persona-experiment-results.md, and docs/evolution/05-failure-modes.md. These are the three peer artifacts in the evidence corpus.

Plugins

Plugin	Purpose	When to Enable
coordinator	Core orchestration, reviewers, all workflow skills	Always
deep-research	Multi-agent research pipelines — internet (A), repo (B), structured (C)	Any project that needs grounded research
notebooklm	NotebookLM-backed research pipeline (D) — YouTube, podcasts, audio sources	When you need media Claude can't read directly
web-dev	Front-end architecture review + UX flow review	Web projects
data-science	ML, statistics, data modeling review	ML/data work

The coordinator plugin is always enabled. Domain plugins are toggled per-project via .claude/coordinator.local.md.

Customization

Name your reviewers (optional). Role labels ship as the default — bash setup/name-personas.sh "the Staff Engineer" "Alex" "the Director of Engineering" "Jordan" binds chosen names to role labels across all plugin files. See the role table in docs/customization.md for all seven roles and their slugs.
Create your own domain reviewer. The shipped domain plugins are the reference pattern: web-dev ships a Front-end Reviewer and a UX Reviewer, data-science ships a Data Science Reviewer (ML/statistics). Each is a self-contained plugin with an agent prompt, a CLAUDE.md, and a .claude-plugin/plugin.json — copy any of them as a starting point for your own specialization.
Per-project configuration. Create .claude/coordinator.local.md with project_type to control which reviewers activate.

See docs/customization.md for templates, the full persona registry, and instructions for adding skills and CI checks.

Companion Plugins

clangd-lsp — C++ code intelligence. Reviewer agents gain go-to-definition, find-references, and call hierarchy ‒ helpful for those (like us) using Claude Code with Unreal Engine.
codex-plugin-cc — Codex CLI integration for parallel execution and second-opinion reviews.
Context7 — External library documentation lookup.

All are optional. Coordinator works without them; relevant features degrade gracefully.

Directory structure

coordinator-claude/
├── plugins/
│   ├── coordinator/            # Core orchestration (always enabled)
│   │   ├── .claude-plugin/plugin.json
│   │   ├── agents/             # 18 — enricher, executor, docs-checker, reviewers, eng-director, vp-product, reviewer-routed workers
│   │   ├── commands/           # 12 workflow commands (hook/ceremony auto-runners)
│   │   ├── hooks/              # context pressure, orientation, commit validation, tier-usage telemetry
│   │   ├── pipelines/          # staff-session team protocol + prompt templates
│   │   └── skills/             # 29 skills (planning, review, debugging, TDD, etc.)
│   ├── deep-research/          # Pipelines A/B/C + 6 research agents
│   │   └── notebooklm/         # Pipeline D (media research via NotebookLM)
│   ├── web-dev/                # Front-end + UX flow reviewers
│   └── data-science/           # ML, statistics reviewer (the Data Science Reviewer)
├── docs/                       # Architecture, customization, research
├── setup/                      # Installer
└── assets/                     # Social preview

Troubleshooting

Plugins not loading:

Check enabledPlugins in ~/.claude/settings.json — must be true
Check ~/.claude/plugins/installed_plugins.json — must have entry with correct installPath
Restart Claude Code (changes take effect on next session)
The installer (setup/install.sh) manages all config files automatically

Plugin cache not syncing after editing source:

Claude Code caches plugins by version. Run bash setup/dev-sync.sh to sync.

Per-project plugin selection:

Create .claude/coordinator.local.md with project_type field
Coordinator is always enabled; domain plugins activate per-project

Research

This system's design is informed by published research and validated through controlled experiments:

Handoff artifacts vs. compaction — why structured baton-passing beats automatic summarization
Named persona performance — evidence that named behavioral profiles improve review quality
Agent orchestration novelty assessment — honest assessment of all patterns against prior art
Anthropic multi-agent alignment — independent convergence with Anthropic's production research system

the PM O'Duffy & Claude

Name		Name	Last commit message	Last commit date
Latest commit History 322 Commits
.claude-plugin		.claude-plugin
.github		.github
assets		assets
docs		docs
evals		evals
plugins		plugins
setup		setup
tests/plugins		tests/plugins
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMERCIAL.md		COMMERCIAL.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coordinator-claude

Who This Is For

What This Is Not

Quick Start

Compatibility

How a Session Works

What You Need to Remember

Flows

How heavy is the workflow?

Plugins

Customization

Companion Plugins

Research

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

coordinator-claude

Who This Is For

What This Is Not

Quick Start

Compatibility

How a Session Works

What You Need to Remember

Flows

How heavy is the workflow?

Plugins

Customization

Companion Plugins

Research

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages