Skip to content

feat: configurable parallelism for experiment conditions within an iteration #82

@sriumcp

Description

@sriumcp

Summary

Add a parallelism knob to the campaign config that the executor agent uses to run independent conditions concurrently within a single arm. No multiple worktrees, no orchestrator-level concurrency — the parallelism lives entirely in the agent's bash execution inside the existing single worktree.

Motivation

A hypothesis bundle with N arms × K seeds × 2 conditions (baseline + treatment) produces N × K × 2 sequential blis run invocations today. These runs are fully independent: same binary, different flags, different output paths. On a machine with 8+ cores, a parallelism: 4 hint could cut experiment wall-time by 3–4×, directly reducing --timeout pressure and the probability of mid-iteration failures (#80).

Design

Campaign config

execution:
  parallelism: 4   # max concurrent blis runs within a single arm

The field belongs under execution: to leave room for future execution hints (e.g. timeout_per_run, retry_on_nonzero).

Defaults

defaults.yaml should carry a default (e.g. parallelism: 1) under execution:, consistent with how models: and max_turns: work today — so campaigns that don't set it explicitly get sequential behavior and the default is visible in one place.

How the hint reaches the agent

LLMDispatcher._build_context() already builds the context dict injected into the executor prompt. Add two lines in the if phase == "execute-analyze": block:

parallelism = self.campaign.get("execution", {}).get("parallelism", 1)
ctx["parallelism"] = str(parallelism)

The prompt template then surfaces {{parallelism}} to the agent.

Executor prompt guidance

A new section in execute_analyze.md Phase 2 should instruct the agent precisely when parallelism is safe and when it is not:

## Parallelism budget: {{parallelism}} concurrent run(s)

When parallelism > 1, you may run up to {{parallelism}} conditions concurrently
using bash background jobs and wait:

    cmd_a & cmd_b & cmd_c & wait

**Safety rules — read carefully:**

1. WITHIN an arm: always safe to parallelize across seeds and configs.
   All conditions use the same code state (same patch applied or no patch).
   Each condition writes to a distinct output path. No conflicts.

2. ACROSS arms with NO code_changes: safe to parallelize.
   No git state is modified; runs are pure binary invocations.

3. ACROSS arms with different code_changes: NOT safe to parallelize.
   Each arm requires a different patch applied to the worktree.
   Run all conditions for arm A (patch applied, seeds in parallel), then
   `git checkout -- .`, then arm B. Never apply two patches simultaneously.

4. Output path discipline: every condition must write to a UNIQUE path
   under {{iter_dir}}/results/<arm_id>/. Verify paths are distinct before
   launching background jobs. Two jobs writing the same path will corrupt results.

5. Exit-code checking: after `wait`, check that each output file exists and
   is non-empty. A backgrounded job that silently fails produces no output file.
   Treat a missing output as a run failure and re-run that condition before analyzing.

Files to change

File Change
schemas/campaign.schema.yaml Add execution object with parallelism: integer, minimum: 1
defaults.yaml Add execution: { parallelism: 1 }
orchestrator/llm_dispatch.py Thread parallelism into _build_context() for execute-analyze phase
prompts/methodology/execute_analyze.md Add parallelism safety section in Phase 2

No changes to run_iteration.py, run_campaign.py, engine.py, the state machine, or findings schemas.

Safety invariants the implementation must enforce

  • No shared output paths: two concurrent jobs must never write the same file. The prompt must instruct the agent to verify path uniqueness before launching background jobs.
  • No concurrent git state mutation: arms with code_changes must be run strictly sequentially at the arm level, even if conditions within that arm are parallelized.
  • Exit-code / output-file verification: after every wait, the agent must confirm each expected output file exists and is non-empty before proceeding to analysis.
  • Default is sequential: parallelism: 1 must be the default so existing campaigns are unaffected.
  • Schema validation: parallelism must be minimum: 1 (zero is not a valid concurrency level).

Non-goals

  • Orchestrator-level concurrency (concurrent.futures, asyncio, threads in Python)
  • Multiple git worktrees per arm
  • Parallelism across iterations (iterations are intentionally sequential — each feeds the next)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions