Add /shipLane command and portable ship-lane playbook

arul28 · claude · arul28 · commit 68ef71aaea5a · 2026-04-23T14:41:01.000-04:00
Introduces an autonomous PR-to-merge driver that runs automate → finalize
once, then polls CI and review comments on a self-paced 12-min cadence,
fixing valid comments and failing tests in place. Prefers TeamCreate agent
teams when available, falls back to parallel Agent calls otherwise. Opens
the PR via `ade prs create` when possible so it shows up in ADE's PR
tracking; falls back to `gh pr create` only after the agent has genuinely
exhausted the ADE path via `--help`-driven discovery.

Also narrows /automate to run only the new and affected tests (not the
full suite), and makes /finalize's 8-shard parallel run explicit so shards
don't get chained serially.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/commands/automate.md b/.claude/commands/automate.md
@@ -35,7 +35,7 @@ Phase 3: Parallel test writing           (agents)
          ├── desktop-tester-1..N  (desktop app tests)
          └── mcp-tester           (mcp server tests, if applicable)
 Phase 4: Test reality check              (lead, after all testers done)
-Phase 5: Full test run                   (lead)
+Phase 5: Scoped test run (new + affected) (lead)
 Phase 6: CI verification                 (lead)
 Phase 7: Summary                         (lead)
 ```
@@ -247,60 +247,40 @@ If issues are found, fix them directly.
 
 ---
 
-## Phase 5: Full Test Run
+## Phase 5: Scoped Test Run
 
-After reality check passes, run ALL created tests to confirm everything passes together.
+Verify the tests **this command just wrote** pass. Do NOT run the full suite — that is `/finalize`'s job, and running it here doubles the wait with no new signal.
 
-### 5a. Desktop tests (all new test files)
+### 5a. New test files together
 
-Run new test files together first:
+Run every test file created in Phase 3 in a single invocation:
 
 ```bash
 cd apps/desktop && npx vitest run [space-separated list of all new test files]
 ```
 
-### 5b. Desktop tests (full sharded run — match CI)
+All new tests must pass. If any fail, fix in place and re-run only the failing files.
 
-Run the full suite the same way CI does — sharded 8-way. Run all 8 shards in parallel:
+### 5b. Affected existing tests
 
-```bash
-cd apps/desktop && npx vitest run --shard=1/8
-cd apps/desktop && npx vitest run --shard=2/8
-cd apps/desktop && npx vitest run --shard=3/8
-cd apps/desktop && npx vitest run --shard=4/8
-cd apps/desktop && npx vitest run --shard=5/8
-cd apps/desktop && npx vitest run --shard=6/8
-cd apps/desktop && npx vitest run --shard=7/8
-cd apps/desktop && npx vitest run --shard=8/8
-```
-
-Or run a specific workspace project:
-
-```bash
-cd apps/desktop && npx vitest run --project unit-main
-cd apps/desktop && npx vitest run --project unit-renderer
-cd apps/desktop && npx vitest run --project unit-shared
-```
-
-### 5c. MCP server tests (if applicable)
-
-```bash
-cd apps/mcp-server && npm test
-```
-
-### 5d. Run affected existing tests
-
-If code changes could break existing tests (e.g., changed a service function's signature), run those existing test files too:
+If the branch's source changes could break existing tests (e.g., changed a service function's signature, renamed an exported type, altered shared contracts), run those existing test files — NOT the full suite:
 
 ```bash
 cd apps/desktop && npx vitest run [affected existing test files]
 ```
 
+Scope "affected" narrowly — direct importers of touched modules and their test siblings. Do not expand to "everything in the same feature folder."
+
 **If tests fail:**
 - Check if it's a flaky test (retry once)
 - If a specific test fails consistently, fix it and re-run only that file
 - Do NOT re-run all tests — only the failed ones
 
+### 5c. Not this command's job
+
+- **Full sharded suite run:** `/finalize` runs all 8 shards (and `test-ade-cli`) the same way CI does. Skip it here.
+- **Build / typecheck / lint:** also deferred to `/finalize`.
+
 ---
 
 ## Phase 6: CI Verification
@@ -354,9 +334,10 @@ Read `.github/workflows/ci.yml`. Verify:
 ### Test Files Created:
 - [List each file with test count]
 
-### Full Suite Run:
-- Desktop: PASS (X tests)
-- MCP Server: PASS (X tests)
+### Scoped Test Run:
+- New test files: PASS (X tests across Y files)
+- Affected existing tests: PASS (X tests) or N/A
+- NOTE: Full sharded suite run is deferred to `/finalize`.
 
 ### CI Coverage:
 - vitest.workspace.ts: All new tests matched by include patterns
@@ -394,7 +375,7 @@ Mark as **"completed"** ONLY if ALL of the following are true:
 
 1. ALL tests pass
 2. All applicable test types were created per gap tracker
-3. Full test run passed (Phase 5)
+3. Scoped test run passed (Phase 5 — new + affected only; full suite deferred to /finalize)
 4. CI covers all new test files (Phase 6)
 5. No tests with silent null guards
 6. No tests that mock the thing being tested
diff --git a/.claude/commands/finalize.md b/.claude/commands/finalize.md
@@ -285,35 +285,51 @@ cd apps/web && npm run typecheck
 cd apps/desktop && npm run lint
 ```
 
-### 3e. Desktop tests (sharded — match CI exactly)
+### 3e. Desktop tests — full suite, sharded 8-way, run in PARALLEL
 
-Shard like CI (8 shards in parallel) to avoid timeout. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding runs across all of them automatically:
+`/finalize` is the gate that runs the whole test suite. Run **all 8 shards concurrently** — not sequentially. Running them serially takes 8× longer and masks real CI wall-clock behavior.
+
+The command must be identical to `.github/workflows/ci.yml` (job `test-desktop`, matrix shard 1–8, step at line 139):
+
+```
+- run: cd apps/desktop && npx vitest run --shard=${{ matrix.shard }}/8
+```
+
+Locally that maps to 8 parallel Bash invocations in a single tool-call round:
 
 ```bash
-cd apps/desktop && npx vitest run --shard=1/8
-cd apps/desktop && npx vitest run --shard=2/8
-cd apps/desktop && npx vitest run --shard=3/8
-cd apps/desktop && npx vitest run --shard=4/8
-cd apps/desktop && npx vitest run --shard=5/8
-cd apps/desktop && npx vitest run --shard=6/8
-cd apps/desktop && npx vitest run --shard=7/8
-cd apps/desktop && npx vitest run --shard=8/8
+cd apps/desktop && npx vitest run --shard=1/8   # shard 1 of 8
+cd apps/desktop && npx vitest run --shard=2/8   # shard 2 of 8
+cd apps/desktop && npx vitest run --shard=3/8   # shard 3 of 8
+cd apps/desktop && npx vitest run --shard=4/8   # shard 4 of 8
+cd apps/desktop && npx vitest run --shard=5/8   # shard 5 of 8
+cd apps/desktop && npx vitest run --shard=6/8   # shard 6 of 8
+cd apps/desktop && npx vitest run --shard=7/8   # shard 7 of 8
+cd apps/desktop && npx vitest run --shard=8/8   # shard 8 of 8
 ```
 
-Or run specific projects when you only need a subset:
+Issue these as 8 concurrent Bash tool calls in a single message (one call per shard). Do not chain them with `&&` or `;` or run them one at a time. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding distributes across all three automatically.
+
+If a shard fails, re-run **only that shard** (or, better, only the specific failing test file inside it). Never re-run all 8 shards to verify a one-file fix.
+
+Workspace-project subsets exist for debugging only; they are NOT a substitute for the sharded run in `/finalize`:
 
 ```bash
 cd apps/desktop && npx vitest run --project unit-main       # ~150+ main-process tests
 cd apps/desktop && npx vitest run --project unit-renderer    # ~85+ renderer tests
 cd apps/desktop && npx vitest run --project unit-shared      # ~7 shared/preload tests
 ```
 
-### 3f. ADE CLI tests
+### 3f. ADE CLI tests — separate CI job, run alongside the 8 shards
+
+CI runs `test-ade-cli` as its own parallel job (`.github/workflows/ci.yml:156`). Locally, include it in the same parallel tool-call round as the 8 desktop shards — it's effectively a 9th concurrent invocation, not something to run after:
 
 ```bash
 cd apps/ade-cli && npm test
 ```
 
+Do NOT run apps/mcp-server tests — the MCP server was removed; the agent-facing surface lives in `apps/ade-cli`.
+
 ### 3g. Build all apps
 
 ```bash
diff --git a/.claude/commands/shipLane.md b/.claude/commands/shipLane.md
@@ -0,0 +1,113 @@
+---
+name: shipLane
+description: 'Autonomously drive a lane through CI + review until merged (automate → finalize → poll/fix loop, self-paced wake-ups, max 5 iterations)'
+---
+
+# Ship Lane Command
+
+Drive the current lane from "work is ready" to "merged on main" without manual shepherding.
+
+**Usage:**
+- `/shipLane` — auto-detects state (existing PR on current branch, or needs initial push)
+- `/shipLane <pr-number>` — operate on a specific PR (useful if you checked out a different branch mid-loop)
+
+**Arguments:** $ARGUMENTS
+
+---
+
+## Source of truth
+
+**Follow the playbook at `docs/playbooks/ship-lane.md`.** All phase logic, state schema, commands, decision rules, and bot-ping rules live there. This wrapper only defines how Claude Code's team + wake-up primitives map onto the playbook.
+
+If you are re-invoked by a scheduled wake-up, read `.ade/shipLane/<sanitized-branch>.json` first. If `status == running`, skip Phase 0 and go straight to Phase 1.
+
+---
+
+## Execution mode: autonomous
+
+This command runs end-to-end without user interaction. Do NOT:
+- Ask the user to confirm, choose, or approve anything.
+- Pause between phases to request direction.
+- Stop on non-fatal warnings — log them and continue.
+- Ask whether to apply a fix — apply, verify, commit.
+
+The only user-visible output is the per-iteration summary and the final Phase 5 exit summary.
+
+---
+
+## Concurrency: TeamCreate is MANDATORY
+
+Check the available tools. If `TeamCreate` is in scope, you MUST use it. Do not fall back to `Agent` calls when a team is available.
+
+### Team composition
+
+Create one team at the start of the invocation, reuse it across iterations.
+
+```
+ship-lane team
+├── lead (this session's main agent)
+├── poll-agent         — runs every iteration, returns structured summary only
+├── rebase-agent       — spawned only when behindMain or conflicts exist
+├── ci-fix-agent       — spawned only when CI failures exist
+├── review-fix-agent   — spawned only when new valid comments exist
+└── conflict-resolver  — spawned by rebase-agent for >5-file conflicts
+```
+
+Initial team setup should also create:
+- `automate-agent` — invoked once in Phase 0 (only when there is no existing PR)
+- `finalize-agent` — invoked once in Phase 0 (only when there is no existing PR)
+
+### Delegation rules
+
+- The lead NEVER reads raw CI logs or full comment threads. It reads the poll-agent's structured summary (see playbook §1.3).
+- Fix agents get minimum scope: failing test paths + error snippets, or comment bodies + file anchors.
+- Fix agents edit files directly; they do not commit.
+- The lead commits and pushes after verifying `git diff`.
+- Rebase-agent runs alone when active — no concurrent file edits from other agents.
+
+### Fallback (TeamCreate not available)
+
+If `TeamCreate` is genuinely not in scope for this session:
+
+- Use parallel `Agent` tool calls for independent work (poll, ci-fix + review-fix in the same iteration).
+- Use serial `Agent` calls for rebase (must run alone) and Phase 0 setup (automate then finalize).
+- Same delegation rules apply — keep the lead's context clean by summarizing sub-agent output aggressively.
+
+---
+
+## Scheduling wake-ups
+
+Use `ScheduleWakeup` at the end of each iteration (playbook §5.3) with the same command re-invocation as the `prompt`:
+
+```
+ScheduleWakeup({
+  delaySeconds: <270 | 720 | 1800 per playbook>,
+  reason: "shipLane iter <N>: <CI running | waiting on review | just pushed>",
+  prompt: "/shipLane $ARGUMENTS"
+})
+```
+
+Pass `$ARGUMENTS` through so a PR-number argument is preserved across wake-ups.
+
+Do NOT schedule a wake if `status` is `done-clean`, `done-max`, or `blocked` — print the summary and stop.
+
+---
+
+## Phase 0 safety rails (Claude Code specific)
+
+Before running `automate-agent` and `finalize-agent` in Phase 0:
+
+1. Confirm `$ARGUMENTS` is empty OR matches a PR number on the current branch. If the PR number is for a different branch, `git checkout` to that branch first.
+2. Confirm `git status` is clean of foreign changes you don't expect. If the working tree has staged changes, commit them with `ship: checkpoint before automate/finalize` so the automate/finalize pipeline runs against a known baseline.
+3. Confirm `origin` is a GitHub remote (`git remote get-url origin`) — `gh pr create` needs it.
+
+If any rail fails, exit `blocked` with a clear reason in the state file and stop.
+
+---
+
+## References
+
+- `docs/playbooks/ship-lane.md` — full phase logic (source of truth).
+- `.claude/commands/automate.md` — invoked by `automate-agent` in Phase 0.
+- `.claude/commands/finalize.md` — invoked by `finalize-agent` in Phase 0.
+- `.github/workflows/ci.yml` — CI job names and shard count (`8`) that the local fallback tests mirror.
diff --git a/.gitignore b/.gitignore
@@ -48,11 +48,13 @@ xcuserdata/
 apps/ios/.dry-run-derived-data/
 apps/ios/build/
 ios-signing/
+.asc/artifacts/
 
 # Tool configs (personal)
 .codex/
 .pnpm-store/
 /apps/desktop/.ade
+/.ade/shipLane/
 /.playwright-mcp
 /.codex-derived-data
 package-lock.json
diff --git a/AGENTS.md b/AGENTS.md
@@ -7,6 +7,10 @@
 - The ADE CLI lives in `apps/ade-cli` and shares core services with the desktop app.
 - State is primarily stored under `.ade/` inside the active project, with runtime metadata in SQLite and machine-local files under `.ade/secrets`, `.ade/cache`, and `.ade/artifacts`.
 
+## Playbooks
+
+- `docs/playbooks/ship-lane.md` — autonomous PR-to-merge driver (automate → finalize → poll-fix loop). Any agent CLI can follow it directly; Claude Code wraps it as `/shipLane`.
+
 ## Working norms
 
 - Preserve existing desktop app patterns before introducing new abstractions.
diff --git a/docs/playbooks/ship-lane.md b/docs/playbooks/ship-lane.md