Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ When `repo_path` is set, the campaign directory is created inside the target rep

The planner explores the codebase to discover metrics, knobs, and execution methods. You can optionally provide `observable_metrics` and `controllable_knobs` as hints — see [examples/campaign.yaml](examples/campaign.yaml) for all options.

If your target is a *running* system rather than a codebase (a cluster, a deployed service, a scratch directory that isn't a git repo), set `target_system.live_target: true`. The executor then runs directly in `repo_path` with no per-iteration `git worktree`, and the planner is told up front that arms must be probes — see [docs/quickstart.md#live-target-campaigns-live_target-true](docs/quickstart.md#live-target-campaigns-live_target-true) for details.

### 5. Run a campaign

```bash
Expand Down
37 changes: 37 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,43 @@ After a campaign, your working directory contains:
- **`runs/iter-N/inputs/`** — Agent-created input files (configs, workloads)
- **`runs/iter-N/results/`** — Experiment output files

## Live-target campaigns (`live_target: true`)

By default Nous treats `repo_path` as a git repo and creates a fresh `git worktree` per iteration so that any source-code patches are isolated. For some campaigns there is no codebase to evolve — the thing you want to study is a *running* system: a Kubernetes cluster, a deployed service, a dataset on disk, a non-git scratch directory. Setting `live_target: true` tells Nous to skip worktree creation and run the executor directly inside `repo_path`.

Use it when:

- The target is a live system you are probing, not a codebase you are mutating (e.g. a GPU cluster, a production-like service, a workload generator).
- `repo_path` points at a directory that is not a git repo, or is a git repo whose working tree must not be branched.
- The bundle should only contain probe-style arms (config tweaks, command-line invocations, observation runs) — never `code_changes`.

Example:

```yaml
research_question: >
Why does p99 latency spike when the cluster autoscaler kicks in?

target_system:
name: "Staging GPU cluster"
description: >
Live Kubernetes cluster running our inference workload.
The agent probes the cluster via kubectl and Prometheus; it does
not modify source code.
repo_path: /scratch/cluster-probe # any working directory; need not be a git repo
live_target: true

prompts:
methodology_layer: "prompts/methodology"
domain_adapter_layer: null
```

How `live_target` differs from regular observe-mode arms:

- **Observe mode** is a *bundle-level* property — an individual arm has no `code_changes`, so the executor skips patching and just runs commands. The campaign can still mix observe arms and evolve arms in the same bundle, and a worktree is still created.
- **`live_target: true`** is a *campaign-level* property — it controls the *executor environment* (no worktree, run in `repo_path` directly) and tells the planner up front that the target is a shared running system, so every arm must be a probe. Bundles with `code_changes` arms are incoherent in this mode.

Pick `live_target: true` when there is nothing meaningful to branch from; pick observe-mode arms when you have a real codebase but a particular iteration only needs to measure, not patch.

## Choosing a model

Defaults (from `defaults.yaml`):
Expand Down
18 changes: 17 additions & 1 deletion orchestrator/iteration.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,13 @@ def run_iteration(
Returns:
An IterationOutcome value: COMPLETED, CONTINUE, ABORTED, or REDESIGN.
"""
# Validate the campaign once, up front. The staticmethod on LLMDispatcher
# is also called from its constructor, but inline-agent mode never builds
# an LLMDispatcher — without this call, a non-bool `live_target` value
# would slip past validation and silently coerce via bool() below.
from orchestrator.llm_dispatch import validate_campaign
validate_campaign(campaign)

engine = Engine(work_dir)
repo_path = campaign.get("target_system", {}).get("repo_path")

Expand Down Expand Up @@ -370,7 +377,10 @@ def _max_turns_for(phase_key: str) -> int:
cli_dispatcher.model = _model_for("execute_analyze")
cli_dispatcher.max_turns = _max_turns_for("execute_analyze")
exec_dispatcher = cli_dispatcher or llm_dispatcher
if repo_path:
live_target = bool(
campaign.get("target_system", {}).get("live_target", False)
)
if repo_path and not live_target:
from orchestrator.worktree import (
create_experiment_worktree,
remove_experiment_worktree,
Expand All @@ -380,6 +390,12 @@ def _max_turns_for(phase_key: str) -> int:
)
(iter_dir / ".experiment_id").write_text(experiment_id)
print(f" Experiment worktree: {experiment_dir}")
elif repo_path:
# Live-target mode: executor runs directly in repo_path. The
# target system is running (cluster, service, dataset) and there
# is nothing to isolate — bundles must contain no code_changes arms.
experiment_dir = Path(repo_path)
print(f" Live target: executor runs in {experiment_dir}")
if cli_dispatcher:
import contextlib
ctx = cli_dispatcher.override_cwd(experiment_dir) if experiment_dir else contextlib.nullcontext()
Expand Down
109 changes: 85 additions & 24 deletions orchestrator/llm_dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,85 @@
# Schema cache: schema_name -> parsed schema dict
_schema_cache: dict[str, dict] = {}

# Prompt fragments that swap based on target_system.live_target. Worktree
# mode is the default — code-evolution campaigns get an isolated git worktree
# per iteration. Live-target mode is for running systems (clusters, services,
# datasets) that the executor probes without per-iteration code mutation.
# (The flag is `live_target` rather than `observational` to avoid colliding
# with the existing "observe mode" in execute_analyze.md, which means
# "the bundle has no code_changes arms.")
_WORKTREE_EXECUTION_ENV = (
"You are running inside an isolated git worktree of the target system. "
"You own this worktree — reset it yourself with `git checkout -- .` "
"between conditions."
)
_LIVE_TARGET_EXECUTION_ENV = (
"You are running directly against a live target system, in its working "
"directory. There is no per-iteration git isolation, and your bundle "
"must contain no `code_changes` arms. Do not mutate the target system's "
"persistent state — your job is to probe, measure, and report. Treat "
"any files you create as scratch artifacts that belong under "
"`{{iter_dir}}/inputs/` or `{{iter_dir}}/results/`, not in the target "
"directory."
)
_WORKTREE_DESIGN_CONSTRAINT = (
"**Worktree isolation assumed.** The executor runs in a clean git "
"worktree. Each condition starts from clean state (`git checkout -- .` "
"runs between conditions). Design your experimental conditions assuming "
"this — don't include manual cleanup steps."
)
_LIVE_TARGET_DESIGN_CONSTRAINT = (
"**Live target system.** The executor runs directly against a running "
"system — no git worktree, no code-change arms. All arms must be pure "
"observations of system state (probes, metrics, log scrapes). Do not "
"include `code_changes` in any arm; do not assume mutation is possible "
"without explicit consent gates."
)

# Per-condition reset step in execute_analyze.md Phase 2. Worktree mode resets
# tracked files between conditions; live-target mode has no checkout to
# revert and instead reminds the agent not to mutate the live target.
_WORKTREE_CONDITION_RESET = "Reset worktree: `git checkout -- .`"
_LIVE_TARGET_CONDITION_RESET = (
"Do not mutate the target system between conditions. Any files you "
"wrote to the target directory during the previous condition must be "
"removed before the next one runs (this is your responsibility — "
"there is no automatic checkout)."
)


def validate_campaign(campaign: dict) -> None:
"""Validate campaign config. Module-level so it can be called before any
dispatcher is constructed (e.g., from `run_iteration` in inline-agent mode,
where no LLMDispatcher is built and the staticmethod path is never taken).
"""
ts = campaign.get("target_system")
if not isinstance(ts, dict):
raise ValueError(
"Campaign config missing 'target_system' section. "
"See examples/campaign.yaml for the expected format."
)
required = ["name", "description"]
missing = [k for k in required if k not in ts]
if missing:
raise ValueError(
f"Campaign 'target_system' missing required keys: {missing}. "
f"See examples/campaign.yaml for the expected format."
)
for field in ("observable_metrics", "controllable_knobs"):
val = ts.get(field)
if val is not None:
if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
raise ValueError(
f"Campaign 'target_system.{field}' must be a list of strings. "
f"Got: {val!r}"
)
if "live_target" in ts and not isinstance(ts["live_target"], bool):
raise ValueError(
f"Campaign 'target_system.live_target' must be a bool. "
f"Got: {ts['live_target']!r}"
)


class LLMDispatcher:
"""Dispatch agent roles to an LLM and produce schema-conformant artifacts."""
Expand All @@ -50,7 +129,7 @@ def __init__(
completion_fn: Callable | None = None,
) -> None:
self.work_dir = Path(work_dir)
self._validate_campaign(campaign)
validate_campaign(campaign)
self.campaign = campaign
self.model = model
self.loader = PromptLoader(
Expand Down Expand Up @@ -84,29 +163,7 @@ def __init__(
dal,
)

@staticmethod
def _validate_campaign(campaign: dict) -> None:
ts = campaign.get("target_system")
if not isinstance(ts, dict):
raise ValueError(
"Campaign config missing 'target_system' section. "
"See examples/campaign.yaml for the expected format."
)
required = ["name", "description"]
missing = [k for k in required if k not in ts]
if missing:
raise ValueError(
f"Campaign 'target_system' missing required keys: {missing}. "
f"See examples/campaign.yaml for the expected format."
)
for field in ("observable_metrics", "controllable_knobs"):
val = ts.get(field)
if val is not None:
if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
raise ValueError(
f"Campaign 'target_system.{field}' must be a list of strings. "
f"Got: {val!r}"
)
_validate_campaign = staticmethod(validate_campaign)

# ------------------------------------------------------------------
# Public interface (satisfies Dispatcher protocol)
Expand Down Expand Up @@ -212,13 +269,17 @@ def _build_context(
perspective: str | None,
) -> dict[str, str]:
ts = self.campaign["target_system"]
live_target = bool(ts.get("live_target", False))
ctx: dict[str, str] = {
"target_system": ts["name"],
"system_description": ts["description"],
"observable_metrics": ", ".join(ts["observable_metrics"]) if ts.get("observable_metrics") else "Not specified — planner should discover from code",
"controllable_knobs": ", ".join(ts["controllable_knobs"]) if ts.get("controllable_knobs") else "Not specified — planner should discover from code",
"active_principles": self._format_principles(),
"iteration": str(iteration),
"execution_environment": _LIVE_TARGET_EXECUTION_ENV if live_target else _WORKTREE_EXECUTION_ENV,
"worktree_constraint": _LIVE_TARGET_DESIGN_CONSTRAINT if live_target else _WORKTREE_DESIGN_CONSTRAINT,
"condition_reset": _LIVE_TARGET_CONDITION_RESET if live_target else _WORKTREE_CONDITION_RESET,
}

if phase == "design":
Expand Down
3 changes: 3 additions & 0 deletions orchestrator/schemas/campaign.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ properties:
type: ["string", "null"]
minLength: 1
description: "Path to target system git repo. Used by CLIDispatcher for code-access agents. If set, experiments run in isolated worktrees."
live_target:
type: boolean
description: "If true, the executor runs directly in repo_path with no per-iteration git worktree. Use for campaigns that probe a running system (cluster, service, dataset) where there is no code to evolve. Bundles must contain no code_changes arms."

models:
type: object
Expand Down
2 changes: 1 addition & 1 deletion prompts/methodology/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ Now design a hypothesis bundle based on what you actually observed and verified:
- Predictions must be directional, falsifiable, and reference specific observable metrics. Do not invent arbitrary numeric thresholds unless campaign.yaml specifies them.
- Base all experiment parameters on verified system behavior — if you didn't probe it, don't assume it.
- **No `sed`/`awk` for code changes.** When describing code modifications in problem framing or bundle arms, describe the *intent* (what to change and why). The executor agent will implement changes properly via file edits, verify they compile, and create reusable `git diff` patches. Never suggest inline shell regex as an implementation strategy.
- **Worktree isolation assumed.** The executor runs in a clean git worktree. Each condition starts from clean state (`git checkout -- .` runs between conditions). Design your experimental conditions assuming this — don't include manual cleanup steps.
- {{worktree_constraint}}

## Output — Write Files Directly

Expand Down
8 changes: 4 additions & 4 deletions prompts/methodology/execute_analyze.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
You are a scientific executor for the Nous hypothesis-driven experimentation framework.

You have **shell access**. You are running inside an isolated git worktree of the target system. You own this worktree — reset it yourself with `git checkout -- .` between conditions.
You have **shell access**. {{execution_environment}}

Your job has FIVE phases — all in one session with full context:
1. **Prepare** — build, create patches, validate ALL commands
Expand Down Expand Up @@ -105,7 +105,7 @@ arms:
```

**Important:**
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — the experiment runs in a worktree that gets cleaned up.
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — only files under `{{iter_dir}}/` are guaranteed to persist past this session.
- Create per-arm result subdirectories before writing output: `mkdir -p {{iter_dir}}/results/<arm_id>` (the top-level `results/` already exists, but per-arm subdirectories like `results/h-main/` do not).
- If you create ANY input files for the experiment (config files, workload specs, policy definitions, parameter files), write them to `{{iter_dir}}/inputs/` and list them in the condition's `inputs` array. Do NOT write input files to `/tmp/` or other temporary locations — they will be lost and the experiment will not be reproducible.

Expand All @@ -114,13 +114,13 @@ arms:
Run the experiment plan you wrote in Step 4 — execute every command exactly as written. The plan is the source of truth.

For each condition:
1. Reset worktree: `git checkout -- .`
1. {{condition_reset}}
2. Run the `cmd` from the plan
3. Verify the `output` file was created at the expected path

After each baseline+treatment pair with the same seed, compare key metrics. If they are byte-identical, STOP and investigate — the patch may not be affecting the code path.

**All results must land in `{{iter_dir}}/results/`.** The worktree is temporary — anything written there will be lost.
**All results must land in `{{iter_dir}}/results/`.** Only files under `{{iter_dir}}/` are guaranteed to persist — anything written elsewhere may be lost.

## Phase 3: Analyze and Write Findings

Expand Down
Loading