A PM-native operating layer for AI engineering work. An opinionated Claude Code plugin that turns product intent into scoped plans, delegated implementation, evidence, and ship/no-ship decisions. You're PM. Claude is EM. You stay technical enough to spot when something's wrong; the EM handles the rest.
You don't need to write code. You need to know enough to evaluate it — to spot when something smells wrong, to ask the right questions, to set direction. Amjad Masad (Replit's founder) opined that people who don't program may actually be better positioned for the LLM-coding world: they won't micromanage implementation. This plugin leans into that. It gives Claude standing authority to make engineering decisions — how to build, who to delegate to, when to refactor — while you hold product authority: what to build, what to cut, what ships. Just like an EM-PM dynamic.
What this system asks of the PM is altitude, not tactics. Architectural judgment, scope decisions, refactor-vs-patch calls, what-ships-and-what-doesn't. Slop is fought on the plan level — by treating each plan as an exhaustive software design document with named-reviewer pushback baked in, not by spot-checking commits afterward. The natural model defaults — act fast and move on, defer P2s into oblivion, size work in human-sprint timeframes — are the things this system is built to push back against. The PM sets direction; the EM and the named reviewers do the pushing.
This isn't a collection of prompt tricks. It's a decision architecture: routines (session orientation, plan review, multi-source code review, structured handoffs, daily flows) plus authority boundaries (EM owns implementation, PM owns scope/ship) plus reviewer personas that push back with fix-gates between them. Reviews are meaningful — blocking findings, not theatre. And the system actively reads its own accumulated wikis and lessons before drafting new plans (prior-art-checker pre-flight, /learn-lessons triage), so the scar tissue from past decisions informs the next one rather than sitting in a write-only memory file. It maps to how PM-EM collaboration actually feels in a real engineering team. The difference is that your "team" can work autonomously for hours, and you can review the output when you're ready.
- Not another agent framework. Claude Code already has subagents, plugins, skills, hooks, and MCP. This sits on top of that infrastructure — it doesn't compete with it.
- Not a PRD-to-code pipeline. Product intent is captured inside plan mode and acceptance criteria, not in a separate PRD funnel.
- Not an autonomous coding agent. The PM is in the loop on scope, tradeoffs, and ship decisions by design. Autonomous modes (
/mise-en-place,/autonomous) exist for execution sprints, not for product authority. - Not a developer productivity tool. The target user evaluates engineering work — they don't write it. If you want to be more productive at writing code yourself, you want vanilla Claude Code; this is for managing AI engineering work, not doing it.
- Not aimed at the fully non-technical PM (yet). The current sweet spot is a PM with technical altitude but not technical hands — someone who has architectural judgment, recognizes unwise plan-shape, and can spot when a plan, an architectural choice, or a tradeoff sounds wrong from how the EM describes it, without ever reading a diff. The PM does not review commits, does not perform code review, does not sign off on PRs. That altitude is the load-bearing constraint: code review and PR sign-off are delegated to AI peers and named-persona staff-tier reviewers (see below). Complex codebases require a PM who can tell when something looks wrong from the conversation, not from the patch.
🤖 Agents: start here → AGENTS.md
You don't install this — your agent does. Open Claude Code in any project and paste:
Install coordinator-claude. The playbook is at
https://github.com/dbc-oduffy/coordinator-claude/blob/main/docs/agent-install.md
— read it, follow it, and queue /project-onboarding as the immediate next step
after I restart Claude Code.
Claude clones the repo, runs the installer, validates the result, and tells you when to restart. After restart, /project-onboarding bootstraps tracking infrastructure in your project.
Auditing & uninstall → docs/safety.md — what the installer changes, what it does not do, audit commands, and exact uninstall steps.
| coordinator-claude | Claude Code | OS tested | Notes |
|---|---|---|---|
| v2.0.0 | tested with Claude Code 2026-05-07 release; minimum version not formally established | macOS, Linux, WSL, Windows (Git Bash) | Reference: setup/install.sh. Requires bash 4.0+ — macOS ships bash 3.2 as /bin/bash, so Mac users brew install bash and run the installer with it ("$(brew --prefix)/bin/bash" setup/install.sh); the installer fails fast with this guidance if run under 3.2. Linux/WSL/Git Bash ship bash 4+. v2.0.0 has breaking changes — see CHANGELOG. |
Most tools hand you a bag of commands and wish you luck. This system has routines — woven into the session lifecycle so you don't have to remember them. Some fire automatically via lifecycle hooks (boot orientation, the context-pressure nudge); the rest are one-keystroke ceremonies the EM runs at the right moment. The distinction matters: hooks are what actually fire on their own; the slash commands below are invoked — by you, or by the EM on your behalf.
Starting up. When Claude opens a supported project, a SessionStart hook fires automatically — loading the current branch, pending handoffs, lessons from past sessions, project vitals, and an orientation cache. No cold start. Claude lands in the middle of the context window where performance is strongest, with forward-looking state already loaded. This is deliberate: research shows LLMs degrade toward the end of their context and, to a lesser extent, at the beginning (Liu et al. 2023, "Lost in the Middle"). The orientation hook front-loads context so the working session occupies the optimal window.
Planning. You describe what you want. Claude enters plan mode — but the plan isn't just written and executed. You review it. In a real dev team, the PM doesn't just say "build auth" and disappear; they review the spec, push back on scope, ask about edge cases. That's what happens here. The coordinator:plan skill is a decision-tree super-skill (triage → substrate → compose → exit) that mechanically binds the trigger word "plan" to the full pipeline: the prior-art-checker pre-flight cross-references the plan against accumulated wikis, lessons, and improvement queues to surface conflicts before an Opus reviewer touches it; the Staff Engineer then reviews; the integrator applies findings. For bigger decisions, /staff-session spawns role-based engineers who independently develop positions and debate to consensus — like pulling your tech lead and director of engineering into a room.
Building. Claude delegates to Sonnet subagents for implementation — cheaper, faster, fresh context. A PreToolUse hook nudges Claude away from doing implementation work directly, because the orchestrator's context is too valuable to spend on writing code. This is the same principle as a real EM: you don't want your engineering manager writing production code when they should be coordinating.
Reviewing. Code review comes from named personas with rich behavioral profiles — a domain expert reviews first (e.g., a web-dev specialist for front-end work, or a data-science specialist for ML pipelines), all findings are applied, then a generalist reviews the clean artifact. Sequential, with mandatory fix gates. Research supports both the persona mechanism (literature-backed for judgment-routing tasks; for mechanical bug-finding our own controlled experiment found no recall gain — that's why bug-sweeps use bare agents) and multi-agent review gains (Anthropic's own eval showed 90.2% improvement over single-agent). Plan-review with personas — the system's main use of named reviewers — leans on industry-standard PRD/SDD review patterns; we have not separately benchmarked it.
Staying coherent. Long sessions hit a hard constraint: context compaction. When triggered, the model summarizes what it thinks happened — a retrospective reconstruction that loses intent. A PostToolUse hook monitors context pressure and prompts Claude to create a structured handoff before compaction fires: decisions made, state reached, explicit next steps. Each handoff chains forward from its predecessor. Research shows structured handoffs significantly outperform automatic summarization (Kang et al. 2025, ACON; Sourcegraph retired compaction in their Amp agent in favor of explicit handoffs after measuring degradation).
Navigating the codebase. This system invests heavily in documentation-as-navigation. Claude's natural mode is grep-heavy — searching text, reading prose, following paper trails. An architecture atlas, project tracker, orientation caches, and structured comments throughout the codebase give Claude something to find when it searches. We call this "grep bait." It's why the doc maintenance pipeline and architecture atlas exist: not bureaucracy, but navigation infrastructure that lets Claude plan from 60 lines of orientation instead of reading 20 source files cold. Research artifacts, lessons files, and handoff documents all serve double duty — they record decisions and create searchable landmarks for future sessions.
Wrapping up. /session-end captures lessons, updates documentation, and commits state. /workday-complete goes further: syncs docs, merges to main via PR, and optionally hibernates the machine. The cycle is continuous — each session starts where the last one left off.
The EM handles most of this on its own. Your key moves:
| Command | When | What It Does |
|---|---|---|
/session-start |
Beginning of work | Deliberate orientation — triage handoffs, surface staleness, choose work. PM-invoked; not auto-fired. (Boot context loads automatically via a separate SessionStart hook; this command is the optional deeper orient.) |
/pickup |
Resuming work | Load a handoff artifact and continue where you left off |
/handoff |
Stepping away | Save session state for the next session |
/staff-session |
Big decisions | Multi-perspective planning or review from persona-based contributors |
/mise-en-place |
Heads-down time | Claude burns through the backlog autonomously — no input needed |
/autonomous |
Override | Suppress handoff nudges when you want Claude to push through compaction |
That's it for daily use. Everything else — delegation, review routing, doc maintenance — the EM drives as part of its workflow; context-pressure management is the one piece handled automatically, by a PostToolUse hook. You don't have to ask for any of it.
Don't memorize commands; learn five flows. Most of what the system does, you'll touch through one of these.
Flow 1 — Build a feature. You describe intent → Claude enters plan mode and proposes acceptance criteria + scope mode → you review and approve → Claude delegates implementation → reviewers (domain expert first, generalist second) check the artifact with fix gates between → for user-visible work or patches that smell like they should be refactors, the VP-Product Reviewer (coordinator:vp-product) — scope challenger, naming optional via setup/name-personas.sh — stress-tests the choice → /merging-to-main produces a ship verdict and you decide.
Flow 2 — Fix a bug. Reproduction first (don't trust the report) → root cause via the systematic-debugging guide → scoped fix in production-patch mode (minimal diff, no opportunistic refactors) → regression check → reviewer → merge. For codebase-wide grinds, /bug-blitz autonomously works through the bug backlog with EM-serial commits at each wave gate.
Flow 3 — Resume work. The boot SessionStart hook automatically loads orientation, lessons, and pending handoffs into context → optionally run /session-start for a deeper triage → you pick up via /pickup <handoff> or pick from the menu → Claude lands mid-context and starts where the last session stopped.
Flow 4 — Autonomous sprint. /mise-en-place gathers ready work, builds a compaction-proof flight recorder, and executes through the backlog without stopping. /autonomous suppresses handoff nudges when you want it to push through context pressure.
Flow 5 — Architecture change. /staff-session plan (multi-perspective debate) → migration plan with rollback → architecture mode review → implementation → verification → architecture atlas update via /update-docs.
Closing the day: /workday-complete validates, syncs, runs the daily review, and merges. /workweek-complete is the larger weekly ceremony with version bump and release notes.
The system scales — a typo fix is a two-word instruction; a system rewrite is a multi-agent pipeline. Here are three representative tiers:
| Tier | Skill / command | Reviewer | Wall time |
|---|---|---|---|
| Tiny edit (typo, constant, rename) | Direct EM edit — no plan | None required | < 5 min |
| Feature (new command, new skill) | /execute-plan after PM approves a plan |
Domain reviewer → the Staff Engineer (coordinator:staff-eng) generalist; the Director of Engineering (coordinator:eng-director) as backstop at High effort (sequential) |
30 min – 2 hrs |
| System rewrite (multi-plugin overhaul) | /staff-session plan → /execute-plan (with executor dispatch per docs/wiki/delegate-execution.md) |
Full sequential chain + PM ship verdict | Half day+ |
See docs/wiki/task-tier-guidance.md for the full tier table, reviewer routing guide, and flow diagrams.
All Commands (appendix)
| Command | Purpose | Why It Exists |
|---|---|---|
/session-start |
Orient to project, load context, choose work | Eliminate cold starts; position key context optimally |
/session-end |
Capture lessons, update docs, commit | Preserve institutional knowledge between sessions |
/pickup |
Resume from a handoff artifact | Structured continuity beats re-reading git log |
/handoff |
Save state before stepping away | Prospective capture > retrospective reconstruction |
/staff-session |
Multi-perspective planning or review | Debate surfaces tradeoffs a solo agent misses |
/mise-en-place |
Autonomous backlog execution | Front-load all context, then execute without interruption |
/autonomous |
Toggle autonomous mode | Trust escalation — suppress handoff nudges |
/workday-start |
Morning orientation and triage | Surface staleness, align priorities, suggest work |
/workday-complete |
End-of-day wrap-up | Daily housekeeping: validate, consolidate, append week-changelog |
/workweek-start |
Strategic weekly orient | Surface last week's results, set this week's priorities |
/workweek-complete |
Release-grade weekly close | Full validation, version bump, parallel code review, merge to main |
/review |
Review a plan / design / RFC | Domain expert → generalist, sequential with fix gates |
/review-code |
Review a code change / diff / PR | Same reviewer pipeline, code-shaped artifacts |
/execute-plan |
Execute a PM-approved plan | Direct implementation without re-planning |
/update-docs |
Repo-wide documentation maintenance | 11-phase pipeline that fights doc staleness |
/bug-sweep |
Systematic codebase bug hunt | Find and fix all AI-fixable bugs in-session |
/bug-blitz |
Autonomous bug-backlog grinder | Wave-based execution with EM-serial commits at each gate |
/dogfood |
Smoke-driven fix-through loop | Binary outcome: converge on a working capability or switch gears |
/learn-lessons |
Triage tasks/lessons.md as doctrine change-requests |
Three modes: local (per-repo), central (cross-repo), recheck (deferred follow-ups) |
/code-health |
Night-shift code health review | Scan today's commits, dispatch reviewer, apply findings |
/architecture-audit |
Deep architecture analysis | Multi-phase agent pipeline for system health |
/distill |
Extract knowledge from session artifacts | Turn plans and handoffs into evergreen wiki docs |
How this maps to real teams
| Real Team Practice | coordinator-claude Equivalent |
|---|---|
| Daily standup | /session-start — what happened, what's blocked, what's next |
| Sprint planning | /staff-session plan — persona-based engineers debate approach |
| Spec review | Plan mode with PM sign-off — Claude proposes, you approve |
| Code review | /review-code — domain expert first, generalist second, fix gates between |
| Tech lead gut-check | Any reviewer can be dispatched ad-hoc for a quick assessment |
| Retrospective | /session-end — capture lessons, update docs |
| Heads-down sprint | /mise-en-place — autonomous execution through the backlog |
| Handoff between shifts | /handoff — structured state capture, not "check the git log" |
The one role we don't have deeply embedded in workflows: designer. Meatspace designers are still better at that, and their judgment is going to remain difficult for LLMs to replicate. There's a vibe-design functionality, but it's not gonna rock your world.
Under the hood — architecture details
Inverted capability delegation. The coordinator sees ~8 thin MCP tools; domain agents access 40+ via proxy with full schemas. This saves ~40K tokens from the coordinator's context and forces delegation by design. A PreToolUse hook nudges the coordinator when it reaches for domain tools directly.
Proactive artifact generation. Before compaction fires, a hook prompts structured handoff creation — a prospective document capturing decisions, state, and next steps. Each artifact chains from its predecessor (cascade obligation) and opens with a synthesis of the prior handoff (anti-amnesia chain). See our handoff vs. compaction research.
Role-based sequential review. Reviewers carry rich behavioral profiles — not just "code reviewer" but distinct roles with expertise domains and review lenses. Sequential review with mandatory fix gates means each reviewer sees a clean artifact. See the persona research and experiment results. Role labels ship as defaults; an optional naming flow lets users bind personal names to roles if that aids their cognitive ease.
6-layer project knowledge. Structure, architecture, activity, temporal, intent, state — none bulk-loaded. A tiered context model loads a ~60-line orientation cache at L1, pulls detailed artifacts on demand at L2, and reserves L3 for deep storage read by subagents. An 11-phase maintenance pipeline fights doc staleness automatically.
Agent Teams for planning. Claude Code's Agent Teams enables multiple Claude sessions that communicate and coordinate. This system uses it for multi-perspective planning: persona-based debaters form independent positions, challenge each other, and a synthesizer cross-references into consensus. Also powers the bundled deep-research pipelines (internet, repo, structured, NotebookLM).
Cross-model delegation. Haiku for mechanical checks, Sonnet for most execution, Opus for judgment and synthesis. Codex CLI integration is available as an opt-in add-on (setup/install.sh --enable-codex plus the external openai/codex-plugin-cc plugin) for a second-opinion channel and independent implementation path; default installs omit it.
See docs/architecture.md for the full model. For the testable claims — what each agent role promises and which hook or script enforces each promise — see docs/contracts.md. For broader context, the novelty research assesses all patterns against published prior art.
For the failure modes this system was designed around — false completion, silent scope expansion, test theater, review laundering, context amnesia, integration blindness, and a dozen others — see docs/evolution/05-failure-modes.md.
For evidence — what was actually built under this workflow, alongside the controlled persona experiment and the qualitative failure-modes ledger — see docs/research/2026-05-08-built-with-coordinator.md, docs/research/2026-03-26-persona-experiment-results.md, and docs/evolution/05-failure-modes.md. These are the three peer artifacts in the evidence corpus.
| Plugin | Purpose | When to Enable |
|---|---|---|
| coordinator | Core orchestration, reviewers, all workflow skills | Always |
| deep-research | Multi-agent research pipelines — internet (A), repo (B), structured (C) | Any project that needs grounded research |
| notebooklm | NotebookLM-backed research pipeline (D) — YouTube, podcasts, audio sources | When you need media Claude can't read directly |
| web-dev | Front-end architecture review + UX flow review | Web projects |
| data-science | ML, statistics, data modeling review | ML/data work |
The coordinator plugin is always enabled. Domain plugins are toggled per-project via .claude/coordinator.local.md.
- Name your reviewers (optional). Role labels ship as the default —
bash setup/name-personas.sh "the Staff Engineer" "Alex" "the Director of Engineering" "Jordan"binds chosen names to role labels across all plugin files. See the role table in docs/customization.md for all seven roles and their slugs. - Create your own domain reviewer. The shipped domain plugins are the reference pattern: web-dev ships a Front-end Reviewer and a UX Reviewer, data-science ships a Data Science Reviewer (ML/statistics). Each is a self-contained plugin with an agent prompt, a
CLAUDE.md, and a.claude-plugin/plugin.json— copy any of them as a starting point for your own specialization. - Per-project configuration. Create
.claude/coordinator.local.mdwithproject_typeto control which reviewers activate.
See docs/customization.md for templates, the full persona registry, and instructions for adding skills and CI checks.
- clangd-lsp — C++ code intelligence. Reviewer agents gain go-to-definition, find-references, and call hierarchy ‒ helpful for those (like us) using Claude Code with Unreal Engine.
- codex-plugin-cc — Codex CLI integration for parallel execution and second-opinion reviews.
- Context7 — External library documentation lookup.
All are optional. Coordinator works without them; relevant features degrade gracefully.
Directory structure
coordinator-claude/
├── plugins/
│ ├── coordinator/ # Core orchestration (always enabled)
│ │ ├── .claude-plugin/plugin.json
│ │ ├── agents/ # 18 — enricher, executor, docs-checker, reviewers, eng-director, vp-product, reviewer-routed workers
│ │ ├── commands/ # 12 workflow commands (hook/ceremony auto-runners)
│ │ ├── hooks/ # context pressure, orientation, commit validation, tier-usage telemetry
│ │ ├── pipelines/ # staff-session team protocol + prompt templates
│ │ └── skills/ # 29 skills (planning, review, debugging, TDD, etc.)
│ ├── deep-research/ # Pipelines A/B/C + 6 research agents
│ │ └── notebooklm/ # Pipeline D (media research via NotebookLM)
│ ├── web-dev/ # Front-end + UX flow reviewers
│ └── data-science/ # ML, statistics reviewer (the Data Science Reviewer)
├── docs/ # Architecture, customization, research
├── setup/ # Installer
└── assets/ # Social preview
Troubleshooting
Plugins not loading:
- Check
enabledPluginsin~/.claude/settings.json— must betrue - Check
~/.claude/plugins/installed_plugins.json— must have entry with correctinstallPath - Restart Claude Code (changes take effect on next session)
- The installer (
setup/install.sh) manages all config files automatically
Plugin cache not syncing after editing source:
- Claude Code caches plugins by version. Run
bash setup/dev-sync.shto sync.
Per-project plugin selection:
- Create
.claude/coordinator.local.mdwithproject_typefield - Coordinator is always enabled; domain plugins activate per-project
This system's design is informed by published research and validated through controlled experiments:
- Handoff artifacts vs. compaction — why structured baton-passing beats automatic summarization
- Named persona performance — evidence that named behavioral profiles improve review quality
- Agent orchestration novelty assessment — honest assessment of all patterns against prior art
- Anthropic multi-agent alignment — independent convergence with Anthropic's production research system
the PM O'Duffy & Claude