Skip to content

Commit 68ef71a

Browse files
arul28claude
andcommitted
Add /shipLane command and portable ship-lane playbook
Introduces an autonomous PR-to-merge driver that runs automate → finalize once, then polls CI and review comments on a self-paced 12-min cadence, fixing valid comments and failing tests in place. Prefers TeamCreate agent teams when available, falls back to parallel Agent calls otherwise. Opens the PR via `ade prs create` when possible so it shows up in ADE's PR tracking; falls back to `gh pr create` only after the agent has genuinely exhausted the ADE path via `--help`-driven discovery. Also narrows /automate to run only the new and affected tests (not the full suite), and makes /finalize's 8-shard parallel run explicit so shards don't get chained serially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 31563c9 commit 68ef71a

6 files changed

Lines changed: 585 additions & 51 deletions

File tree

.claude/commands/automate.md

Lines changed: 20 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Phase 3: Parallel test writing (agents)
3535
├── desktop-tester-1..N (desktop app tests)
3636
└── mcp-tester (mcp server tests, if applicable)
3737
Phase 4: Test reality check (lead, after all testers done)
38-
Phase 5: Full test run (lead)
38+
Phase 5: Scoped test run (new + affected) (lead)
3939
Phase 6: CI verification (lead)
4040
Phase 7: Summary (lead)
4141
```
@@ -247,60 +247,40 @@ If issues are found, fix them directly.
247247

248248
---
249249

250-
## Phase 5: Full Test Run
250+
## Phase 5: Scoped Test Run
251251

252-
After reality check passes, run ALL created tests to confirm everything passes together.
252+
Verify the tests **this command just wrote** pass. Do NOT run the full suite — that is `/finalize`'s job, and running it here doubles the wait with no new signal.
253253

254-
### 5a. Desktop tests (all new test files)
254+
### 5a. New test files together
255255

256-
Run new test files together first:
256+
Run every test file created in Phase 3 in a single invocation:
257257

258258
```bash
259259
cd apps/desktop && npx vitest run [space-separated list of all new test files]
260260
```
261261

262-
### 5b. Desktop tests (full sharded run — match CI)
262+
All new tests must pass. If any fail, fix in place and re-run only the failing files.
263263

264-
Run the full suite the same way CI does — sharded 8-way. Run all 8 shards in parallel:
264+
### 5b. Affected existing tests
265265

266-
```bash
267-
cd apps/desktop && npx vitest run --shard=1/8
268-
cd apps/desktop && npx vitest run --shard=2/8
269-
cd apps/desktop && npx vitest run --shard=3/8
270-
cd apps/desktop && npx vitest run --shard=4/8
271-
cd apps/desktop && npx vitest run --shard=5/8
272-
cd apps/desktop && npx vitest run --shard=6/8
273-
cd apps/desktop && npx vitest run --shard=7/8
274-
cd apps/desktop && npx vitest run --shard=8/8
275-
```
276-
277-
Or run a specific workspace project:
278-
279-
```bash
280-
cd apps/desktop && npx vitest run --project unit-main
281-
cd apps/desktop && npx vitest run --project unit-renderer
282-
cd apps/desktop && npx vitest run --project unit-shared
283-
```
284-
285-
### 5c. MCP server tests (if applicable)
286-
287-
```bash
288-
cd apps/mcp-server && npm test
289-
```
290-
291-
### 5d. Run affected existing tests
292-
293-
If code changes could break existing tests (e.g., changed a service function's signature), run those existing test files too:
266+
If the branch's source changes could break existing tests (e.g., changed a service function's signature, renamed an exported type, altered shared contracts), run those existing test files — NOT the full suite:
294267

295268
```bash
296269
cd apps/desktop && npx vitest run [affected existing test files]
297270
```
298271

272+
Scope "affected" narrowly — direct importers of touched modules and their test siblings. Do not expand to "everything in the same feature folder."
273+
299274
**If tests fail:**
300275
- Check if it's a flaky test (retry once)
301276
- If a specific test fails consistently, fix it and re-run only that file
302277
- Do NOT re-run all tests — only the failed ones
303278

279+
### 5c. Not this command's job
280+
281+
- **Full sharded suite run:** `/finalize` runs all 8 shards (and `test-ade-cli`) the same way CI does. Skip it here.
282+
- **Build / typecheck / lint:** also deferred to `/finalize`.
283+
304284
---
305285

306286
## Phase 6: CI Verification
@@ -354,9 +334,10 @@ Read `.github/workflows/ci.yml`. Verify:
354334
### Test Files Created:
355335
- [List each file with test count]
356336
357-
### Full Suite Run:
358-
- Desktop: PASS (X tests)
359-
- MCP Server: PASS (X tests)
337+
### Scoped Test Run:
338+
- New test files: PASS (X tests across Y files)
339+
- Affected existing tests: PASS (X tests) or N/A
340+
- NOTE: Full sharded suite run is deferred to `/finalize`.
360341
361342
### CI Coverage:
362343
- vitest.workspace.ts: All new tests matched by include patterns
@@ -394,7 +375,7 @@ Mark as **"completed"** ONLY if ALL of the following are true:
394375

395376
1. ALL tests pass
396377
2. All applicable test types were created per gap tracker
397-
3. Full test run passed (Phase 5)
378+
3. Scoped test run passed (Phase 5 — new + affected only; full suite deferred to /finalize)
398379
4. CI covers all new test files (Phase 6)
399380
5. No tests with silent null guards
400381
6. No tests that mock the thing being tested

.claude/commands/finalize.md

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -285,35 +285,51 @@ cd apps/web && npm run typecheck
285285
cd apps/desktop && npm run lint
286286
```
287287

288-
### 3e. Desktop tests (sharded — match CI exactly)
288+
### 3e. Desktop tests — full suite, sharded 8-way, run in PARALLEL
289289

290-
Shard like CI (8 shards in parallel) to avoid timeout. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding runs across all of them automatically:
290+
`/finalize` is the gate that runs the whole test suite. Run **all 8 shards concurrently** — not sequentially. Running them serially takes 8× longer and masks real CI wall-clock behavior.
291+
292+
The command must be identical to `.github/workflows/ci.yml` (job `test-desktop`, matrix shard 1–8, step at line 139):
293+
294+
```
295+
- run: cd apps/desktop && npx vitest run --shard=${{ matrix.shard }}/8
296+
```
297+
298+
Locally that maps to 8 parallel Bash invocations in a single tool-call round:
291299

292300
```bash
293-
cd apps/desktop && npx vitest run --shard=1/8
294-
cd apps/desktop && npx vitest run --shard=2/8
295-
cd apps/desktop && npx vitest run --shard=3/8
296-
cd apps/desktop && npx vitest run --shard=4/8
297-
cd apps/desktop && npx vitest run --shard=5/8
298-
cd apps/desktop && npx vitest run --shard=6/8
299-
cd apps/desktop && npx vitest run --shard=7/8
300-
cd apps/desktop && npx vitest run --shard=8/8
301+
cd apps/desktop && npx vitest run --shard=1/8 # shard 1 of 8
302+
cd apps/desktop && npx vitest run --shard=2/8 # shard 2 of 8
303+
cd apps/desktop && npx vitest run --shard=3/8 # shard 3 of 8
304+
cd apps/desktop && npx vitest run --shard=4/8 # shard 4 of 8
305+
cd apps/desktop && npx vitest run --shard=5/8 # shard 5 of 8
306+
cd apps/desktop && npx vitest run --shard=6/8 # shard 6 of 8
307+
cd apps/desktop && npx vitest run --shard=7/8 # shard 7 of 8
308+
cd apps/desktop && npx vitest run --shard=8/8 # shard 8 of 8
301309
```
302310

303-
Or run specific projects when you only need a subset:
311+
Issue these as 8 concurrent Bash tool calls in a single message (one call per shard). Do not chain them with `&&` or `;` or run them one at a time. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding distributes across all three automatically.
312+
313+
If a shard fails, re-run **only that shard** (or, better, only the specific failing test file inside it). Never re-run all 8 shards to verify a one-file fix.
314+
315+
Workspace-project subsets exist for debugging only; they are NOT a substitute for the sharded run in `/finalize`:
304316

305317
```bash
306318
cd apps/desktop && npx vitest run --project unit-main # ~150+ main-process tests
307319
cd apps/desktop && npx vitest run --project unit-renderer # ~85+ renderer tests
308320
cd apps/desktop && npx vitest run --project unit-shared # ~7 shared/preload tests
309321
```
310322

311-
### 3f. ADE CLI tests
323+
### 3f. ADE CLI tests — separate CI job, run alongside the 8 shards
324+
325+
CI runs `test-ade-cli` as its own parallel job (`.github/workflows/ci.yml:156`). Locally, include it in the same parallel tool-call round as the 8 desktop shards — it's effectively a 9th concurrent invocation, not something to run after:
312326

313327
```bash
314328
cd apps/ade-cli && npm test
315329
```
316330

331+
Do NOT run apps/mcp-server tests — the MCP server was removed; the agent-facing surface lives in `apps/ade-cli`.
332+
317333
### 3g. Build all apps
318334

319335
```bash

.claude/commands/shipLane.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
name: shipLane
3+
description: 'Autonomously drive a lane through CI + review until merged (automate → finalize → poll/fix loop, self-paced wake-ups, max 5 iterations)'
4+
---
5+
6+
# Ship Lane Command
7+
8+
Drive the current lane from "work is ready" to "merged on main" without manual shepherding.
9+
10+
**Usage:**
11+
- `/shipLane` — auto-detects state (existing PR on current branch, or needs initial push)
12+
- `/shipLane <pr-number>` — operate on a specific PR (useful if you checked out a different branch mid-loop)
13+
14+
**Arguments:** $ARGUMENTS
15+
16+
---
17+
18+
## Source of truth
19+
20+
**Follow the playbook at `docs/playbooks/ship-lane.md`.** All phase logic, state schema, commands, decision rules, and bot-ping rules live there. This wrapper only defines how Claude Code's team + wake-up primitives map onto the playbook.
21+
22+
If you are re-invoked by a scheduled wake-up, read `.ade/shipLane/<sanitized-branch>.json` first. If `status == running`, skip Phase 0 and go straight to Phase 1.
23+
24+
---
25+
26+
## Execution mode: autonomous
27+
28+
This command runs end-to-end without user interaction. Do NOT:
29+
- Ask the user to confirm, choose, or approve anything.
30+
- Pause between phases to request direction.
31+
- Stop on non-fatal warnings — log them and continue.
32+
- Ask whether to apply a fix — apply, verify, commit.
33+
34+
The only user-visible output is the per-iteration summary and the final Phase 5 exit summary.
35+
36+
---
37+
38+
## Concurrency: TeamCreate is MANDATORY
39+
40+
Check the available tools. If `TeamCreate` is in scope, you MUST use it. Do not fall back to `Agent` calls when a team is available.
41+
42+
### Team composition
43+
44+
Create one team at the start of the invocation, reuse it across iterations.
45+
46+
```
47+
ship-lane team
48+
├── lead (this session's main agent)
49+
├── poll-agent — runs every iteration, returns structured summary only
50+
├── rebase-agent — spawned only when behindMain or conflicts exist
51+
├── ci-fix-agent — spawned only when CI failures exist
52+
├── review-fix-agent — spawned only when new valid comments exist
53+
└── conflict-resolver — spawned by rebase-agent for >5-file conflicts
54+
```
55+
56+
Initial team setup should also create:
57+
- `automate-agent` — invoked once in Phase 0 (only when there is no existing PR)
58+
- `finalize-agent` — invoked once in Phase 0 (only when there is no existing PR)
59+
60+
### Delegation rules
61+
62+
- The lead NEVER reads raw CI logs or full comment threads. It reads the poll-agent's structured summary (see playbook §1.3).
63+
- Fix agents get minimum scope: failing test paths + error snippets, or comment bodies + file anchors.
64+
- Fix agents edit files directly; they do not commit.
65+
- The lead commits and pushes after verifying `git diff`.
66+
- Rebase-agent runs alone when active — no concurrent file edits from other agents.
67+
68+
### Fallback (TeamCreate not available)
69+
70+
If `TeamCreate` is genuinely not in scope for this session:
71+
72+
- Use parallel `Agent` tool calls for independent work (poll, ci-fix + review-fix in the same iteration).
73+
- Use serial `Agent` calls for rebase (must run alone) and Phase 0 setup (automate then finalize).
74+
- Same delegation rules apply — keep the lead's context clean by summarizing sub-agent output aggressively.
75+
76+
---
77+
78+
## Scheduling wake-ups
79+
80+
Use `ScheduleWakeup` at the end of each iteration (playbook §5.3) with the same command re-invocation as the `prompt`:
81+
82+
```
83+
ScheduleWakeup({
84+
delaySeconds: <270 | 720 | 1800 per playbook>,
85+
reason: "shipLane iter <N>: <CI running | waiting on review | just pushed>",
86+
prompt: "/shipLane $ARGUMENTS"
87+
})
88+
```
89+
90+
Pass `$ARGUMENTS` through so a PR-number argument is preserved across wake-ups.
91+
92+
Do NOT schedule a wake if `status` is `done-clean`, `done-max`, or `blocked` — print the summary and stop.
93+
94+
---
95+
96+
## Phase 0 safety rails (Claude Code specific)
97+
98+
Before running `automate-agent` and `finalize-agent` in Phase 0:
99+
100+
1. Confirm `$ARGUMENTS` is empty OR matches a PR number on the current branch. If the PR number is for a different branch, `git checkout` to that branch first.
101+
2. Confirm `git status` is clean of foreign changes you don't expect. If the working tree has staged changes, commit them with `ship: checkpoint before automate/finalize` so the automate/finalize pipeline runs against a known baseline.
102+
3. Confirm `origin` is a GitHub remote (`git remote get-url origin`) — `gh pr create` needs it.
103+
104+
If any rail fails, exit `blocked` with a clear reason in the state file and stop.
105+
106+
---
107+
108+
## References
109+
110+
- `docs/playbooks/ship-lane.md` — full phase logic (source of truth).
111+
- `.claude/commands/automate.md` — invoked by `automate-agent` in Phase 0.
112+
- `.claude/commands/finalize.md` — invoked by `finalize-agent` in Phase 0.
113+
- `.github/workflows/ci.yml` — CI job names and shard count (`8`) that the local fallback tests mirror.

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,13 @@ xcuserdata/
4848
apps/ios/.dry-run-derived-data/
4949
apps/ios/build/
5050
ios-signing/
51+
.asc/artifacts/
5152

5253
# Tool configs (personal)
5354
.codex/
5455
.pnpm-store/
5556
/apps/desktop/.ade
57+
/.ade/shipLane/
5658
/.playwright-mcp
5759
/.codex-derived-data
5860
package-lock.json

AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77
- The ADE CLI lives in `apps/ade-cli` and shares core services with the desktop app.
88
- State is primarily stored under `.ade/` inside the active project, with runtime metadata in SQLite and machine-local files under `.ade/secrets`, `.ade/cache`, and `.ade/artifacts`.
99

10+
## Playbooks
11+
12+
- `docs/playbooks/ship-lane.md` — autonomous PR-to-merge driver (automate → finalize → poll-fix loop). Any agent CLI can follow it directly; Claude Code wraps it as `/shipLane`.
13+
1014
## Working norms
1115

1216
- Preserve existing desktop app patterns before introducing new abstractions.

0 commit comments

Comments
 (0)