Summary
Add a parallelism knob to the campaign config that the executor agent uses to run independent conditions concurrently within a single arm. No multiple worktrees, no orchestrator-level concurrency — the parallelism lives entirely in the agent's bash execution inside the existing single worktree.
Motivation
A hypothesis bundle with N arms × K seeds × 2 conditions (baseline + treatment) produces N × K × 2 sequential blis run invocations today. These runs are fully independent: same binary, different flags, different output paths. On a machine with 8+ cores, a parallelism: 4 hint could cut experiment wall-time by 3–4×, directly reducing --timeout pressure and the probability of mid-iteration failures (#80).
Design
Campaign config
execution:
parallelism: 4 # max concurrent blis runs within a single arm
The field belongs under execution: to leave room for future execution hints (e.g. timeout_per_run, retry_on_nonzero).
Defaults
defaults.yaml should carry a default (e.g. parallelism: 1) under execution:, consistent with how models: and max_turns: work today — so campaigns that don't set it explicitly get sequential behavior and the default is visible in one place.
How the hint reaches the agent
LLMDispatcher._build_context() already builds the context dict injected into the executor prompt. Add two lines in the if phase == "execute-analyze": block:
parallelism = self.campaign.get("execution", {}).get("parallelism", 1)
ctx["parallelism"] = str(parallelism)
The prompt template then surfaces {{parallelism}} to the agent.
Executor prompt guidance
A new section in execute_analyze.md Phase 2 should instruct the agent precisely when parallelism is safe and when it is not:
## Parallelism budget: {{parallelism}} concurrent run(s)
When parallelism > 1, you may run up to {{parallelism}} conditions concurrently
using bash background jobs and wait:
cmd_a & cmd_b & cmd_c & wait
**Safety rules — read carefully:**
1. WITHIN an arm: always safe to parallelize across seeds and configs.
All conditions use the same code state (same patch applied or no patch).
Each condition writes to a distinct output path. No conflicts.
2. ACROSS arms with NO code_changes: safe to parallelize.
No git state is modified; runs are pure binary invocations.
3. ACROSS arms with different code_changes: NOT safe to parallelize.
Each arm requires a different patch applied to the worktree.
Run all conditions for arm A (patch applied, seeds in parallel), then
`git checkout -- .`, then arm B. Never apply two patches simultaneously.
4. Output path discipline: every condition must write to a UNIQUE path
under {{iter_dir}}/results/<arm_id>/. Verify paths are distinct before
launching background jobs. Two jobs writing the same path will corrupt results.
5. Exit-code checking: after `wait`, check that each output file exists and
is non-empty. A backgrounded job that silently fails produces no output file.
Treat a missing output as a run failure and re-run that condition before analyzing.
Files to change
| File |
Change |
schemas/campaign.schema.yaml |
Add execution object with parallelism: integer, minimum: 1 |
defaults.yaml |
Add execution: { parallelism: 1 } |
orchestrator/llm_dispatch.py |
Thread parallelism into _build_context() for execute-analyze phase |
prompts/methodology/execute_analyze.md |
Add parallelism safety section in Phase 2 |
No changes to run_iteration.py, run_campaign.py, engine.py, the state machine, or findings schemas.
Safety invariants the implementation must enforce
- No shared output paths: two concurrent jobs must never write the same file. The prompt must instruct the agent to verify path uniqueness before launching background jobs.
- No concurrent git state mutation: arms with
code_changes must be run strictly sequentially at the arm level, even if conditions within that arm are parallelized.
- Exit-code / output-file verification: after every
wait, the agent must confirm each expected output file exists and is non-empty before proceeding to analysis.
- Default is sequential:
parallelism: 1 must be the default so existing campaigns are unaffected.
- Schema validation:
parallelism must be minimum: 1 (zero is not a valid concurrency level).
Non-goals
- Orchestrator-level concurrency (
concurrent.futures, asyncio, threads in Python)
- Multiple git worktrees per arm
- Parallelism across iterations (iterations are intentionally sequential — each feeds the next)
Summary
Add a
parallelismknob to the campaign config that the executor agent uses to run independent conditions concurrently within a single arm. No multiple worktrees, no orchestrator-level concurrency — the parallelism lives entirely in the agent's bash execution inside the existing single worktree.Motivation
A hypothesis bundle with N arms × K seeds × 2 conditions (baseline + treatment) produces
N × K × 2sequentialblis runinvocations today. These runs are fully independent: same binary, different flags, different output paths. On a machine with 8+ cores, aparallelism: 4hint could cut experiment wall-time by 3–4×, directly reducing--timeoutpressure and the probability of mid-iteration failures (#80).Design
Campaign config
The field belongs under
execution:to leave room for future execution hints (e.g.timeout_per_run,retry_on_nonzero).Defaults
defaults.yamlshould carry a default (e.g.parallelism: 1) underexecution:, consistent with howmodels:andmax_turns:work today — so campaigns that don't set it explicitly get sequential behavior and the default is visible in one place.How the hint reaches the agent
LLMDispatcher._build_context()already builds the context dict injected into the executor prompt. Add two lines in theif phase == "execute-analyze":block:The prompt template then surfaces
{{parallelism}}to the agent.Executor prompt guidance
A new section in
execute_analyze.mdPhase 2 should instruct the agent precisely when parallelism is safe and when it is not:Files to change
schemas/campaign.schema.yamlexecutionobject withparallelism: integer, minimum: 1defaults.yamlexecution: { parallelism: 1 }orchestrator/llm_dispatch.pyparallelisminto_build_context()forexecute-analyzephaseprompts/methodology/execute_analyze.mdNo changes to
run_iteration.py,run_campaign.py,engine.py, the state machine, or findings schemas.Safety invariants the implementation must enforce
code_changesmust be run strictly sequentially at the arm level, even if conditions within that arm are parallelized.wait, the agent must confirm each expected output file exists and is non-empty before proceeding to analysis.parallelism: 1must be the default so existing campaigns are unaffected.parallelismmust beminimum: 1(zero is not a valid concurrency level).Non-goals
concurrent.futures,asyncio, threads in Python)