TL;DR
After DESIGN produces experiment_plan.yaml, fan out arms to subagents in parallel, each in its own git worktree. Reduce wall-clock 2–4× on multi-arm iterations and eliminate the 90-minute "single mega-session" failure mode.
Why this matters
mech-design-enforcement iter-2 ran 8 conditions × 3 seeds = 24 simulations sequentially in one Sonnet session. That 2.5-hour session was the proximate cause of the connection drops the user complained about on 5/18 — and when one of those connection drops triggered --max-cli-retries 10, a second worktree spawned while the first was still alive, leading to two executors racing on the same iter-2/results/ directory.
Subagents share the parent's CLAUDE.md / skills / auto-memory, so per-arm context is small. Each arm is independent (different seed / different policy / same patched binary). Worktree-per-arm gives clean isolation.
What's already shipped
Proposed approach
- Parse
experiment_plan.yaml into a list of (arm, seed, command) units.
- For each unit, spawn a subagent via the SDK with:
- Cap concurrency by a campaign-level
max_parallel_arms: setting (default = min(CPU, 4)).
- Parent agent awaits all subagents, then runs the existing deterministic merge into
findings.json and principle_updates.json.
- On partial failure, retry only the failed units (not the whole iteration).
Acceptance criteria
Out of scope
Part of #120.
TL;DR
After DESIGN produces
experiment_plan.yaml, fan out arms to subagents in parallel, each in its own git worktree. Reduce wall-clock 2–4× on multi-arm iterations and eliminate the 90-minute "single mega-session" failure mode.Why this matters
mech-design-enforcementiter-2 ran 8 conditions × 3 seeds = 24 simulations sequentially in one Sonnet session. That 2.5-hour session was the proximate cause of the connection drops the user complained about on 5/18 — and when one of those connection drops triggered--max-cli-retries 10, a second worktree spawned while the first was still alive, leading to two executors racing on the sameiter-2/results/directory.Subagents share the parent's CLAUDE.md / skills / auto-memory, so per-arm context is small. Each arm is independent (different seed / different policy / same patched binary). Worktree-per-arm gives clean isolation.
What's already shipped
worktree.pyalready manages per-iteration worktrees.Proposed approach
experiment_plan.yamlinto a list of(arm, seed, command)units.iter-N/inputs/and the patched binary.iter-N/results/<arm>/<seed>/.max_parallel_arms:setting (default = min(CPU, 4)).findings.jsonandprinciple_updates.json.Acceptance criteria
examples/campaign-best-of-field.yamlruns end-to-end withmax_parallel_arms: 4in significantly less wall-clock than the serial baseline.arm_status: "failed"infindings.json.results/subpath.Out of scope
Part of #120.