AI-native-Systems-Research · mtoslalibu · Jun 2, 2026 · May 26, 2026 · May 26, 2026 · May 27, 2026
diff --git a/README.md b/README.md
@@ -123,6 +123,8 @@ When `repo_path` is set, the campaign directory is created inside the target rep
 
 The planner explores the codebase to discover metrics, knobs, and execution methods. You can optionally provide `observable_metrics` and `controllable_knobs` as hints — see [examples/campaign.yaml](examples/campaign.yaml) for all options.
 
+If your target is a *running* system rather than a codebase (a cluster, a deployed service, a scratch directory that isn't a git repo), set `target_system.live_target: true`. The executor then runs directly in `repo_path` with no per-iteration `git worktree`, and the planner is told up front that arms must be probes — see [docs/quickstart.md#live-target-campaigns-live_target-true](docs/quickstart.md#live-target-campaigns-live_target-true) for details.
+
 ### 5. Run a campaign
 
 ```bash

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -125,6 +125,43 @@ After a campaign, your working directory contains:
 - **`runs/iter-N/inputs/`** — Agent-created input files (configs, workloads)
 - **`runs/iter-N/results/`** — Experiment output files
 
+## Live-target campaigns (`live_target: true`)
+
+By default Nous treats `repo_path` as a git repo and creates a fresh `git worktree` per iteration so that any source-code patches are isolated. For some campaigns there is no codebase to evolve — the thing you want to study is a *running* system: a Kubernetes cluster, a deployed service, a dataset on disk, a non-git scratch directory. Setting `live_target: true` tells Nous to skip worktree creation and run the executor directly inside `repo_path`.
+
+Use it when:
+
+- The target is a live system you are probing, not a codebase you are mutating (e.g. a GPU cluster, a production-like service, a workload generator).
+- `repo_path` points at a directory that is not a git repo, or is a git repo whose working tree must not be branched.
+- The bundle should only contain probe-style arms (config tweaks, command-line invocations, observation runs) — never `code_changes`.
+
+Example:
+
+```yaml
+research_question: >
+  Why does p99 latency spike when the cluster autoscaler kicks in?
+
+target_system:
+  name: "Staging GPU cluster"
+  description: >
+    Live Kubernetes cluster running our inference workload.
+    The agent probes the cluster via kubectl and Prometheus; it does
+    not modify source code.
+  repo_path: /scratch/cluster-probe   # any working directory; need not be a git repo
+  live_target: true
+
+prompts:
+  methodology_layer: "prompts/methodology"
+  domain_adapter_layer: null
+```
+
+How `live_target` differs from regular observe-mode arms:
+
+- **Observe mode** is a *bundle-level* property — an individual arm has no `code_changes`, so the executor skips patching and just runs commands. The campaign can still mix observe arms and evolve arms in the same bundle, and a worktree is still created.
+- **`live_target: true`** is a *campaign-level* property — it controls the *executor environment* (no worktree, run in `repo_path` directly) and tells the planner up front that the target is a shared running system, so every arm must be a probe. Bundles with `code_changes` arms are incoherent in this mode.
+
+Pick `live_target: true` when there is nothing meaningful to branch from; pick observe-mode arms when you have a real codebase but a particular iteration only needs to measure, not patch.
+
 ## Choosing a model
 
 Defaults (from `defaults.yaml`):

diff --git a/orchestrator/iteration.py b/orchestrator/iteration.py
@@ -255,6 +255,13 @@ def run_iteration(
     Returns:
         An IterationOutcome value: COMPLETED, CONTINUE, ABORTED, or REDESIGN.
     """
+    # Validate the campaign once, up front. The staticmethod on LLMDispatcher
+    # is also called from its constructor, but inline-agent mode never builds
+    # an LLMDispatcher — without this call, a non-bool `live_target` value
+    # would slip past validation and silently coerce via bool() below.
+    from orchestrator.llm_dispatch import validate_campaign
+    validate_campaign(campaign)
+
     engine = Engine(work_dir)
     repo_path = campaign.get("target_system", {}).get("repo_path")
 
@@ -370,7 +377,10 @@ def _max_turns_for(phase_key: str) -> int:
             cli_dispatcher.model = _model_for("execute_analyze")
             cli_dispatcher.max_turns = _max_turns_for("execute_analyze")
         exec_dispatcher = cli_dispatcher or llm_dispatcher
-        if repo_path:
+        live_target = bool(
+            campaign.get("target_system", {}).get("live_target", False)
+        )
+        if repo_path and not live_target:
             from orchestrator.worktree import (
                 create_experiment_worktree,
                 remove_experiment_worktree,
@@ -380,6 +390,12 @@ def _max_turns_for(phase_key: str) -> int:
             )
             (iter_dir / ".experiment_id").write_text(experiment_id)
             print(f"  Experiment worktree: {experiment_dir}")
+        elif repo_path:
+            # Live-target mode: executor runs directly in repo_path. The
+            # target system is running (cluster, service, dataset) and there
+            # is nothing to isolate — bundles must contain no code_changes arms.
+            experiment_dir = Path(repo_path)
+            print(f"  Live target: executor runs in {experiment_dir}")
         if cli_dispatcher:
             import contextlib
             ctx = cli_dispatcher.override_cwd(experiment_dir) if experiment_dir else contextlib.nullcontext()

diff --git a/orchestrator/llm_dispatch.py b/orchestrator/llm_dispatch.py
@@ -35,6 +35,85 @@
 # Schema cache: schema_name -> parsed schema dict
 _schema_cache: dict[str, dict] = {}
 
+# Prompt fragments that swap based on target_system.live_target. Worktree
+# mode is the default — code-evolution campaigns get an isolated git worktree
+# per iteration. Live-target mode is for running systems (clusters, services,
+# datasets) that the executor probes without per-iteration code mutation.
+# (The flag is `live_target` rather than `observational` to avoid colliding
+# with the existing "observe mode" in execute_analyze.md, which means
+# "the bundle has no code_changes arms.")
+_WORKTREE_EXECUTION_ENV = (
+    "You are running inside an isolated git worktree of the target system. "
+    "You own this worktree — reset it yourself with `git checkout -- .` "
+    "between conditions."
+)
+_LIVE_TARGET_EXECUTION_ENV = (
+    "You are running directly against a live target system, in its working "
+    "directory. There is no per-iteration git isolation, and your bundle "
+    "must contain no `code_changes` arms. Do not mutate the target system's "
+    "persistent state — your job is to probe, measure, and report. Treat "
+    "any files you create as scratch artifacts that belong under "
+    "`{{iter_dir}}/inputs/` or `{{iter_dir}}/results/`, not in the target "
+    "directory."
+)
+_WORKTREE_DESIGN_CONSTRAINT = (
+    "**Worktree isolation assumed.** The executor runs in a clean git "
+    "worktree. Each condition starts from clean state (`git checkout -- .` "
+    "runs between conditions). Design your experimental conditions assuming "
+    "this — don't include manual cleanup steps."
+)
+_LIVE_TARGET_DESIGN_CONSTRAINT = (
+    "**Live target system.** The executor runs directly against a running "
+    "system — no git worktree, no code-change arms. All arms must be pure "
+    "observations of system state (probes, metrics, log scrapes). Do not "
+    "include `code_changes` in any arm; do not assume mutation is possible "
+    "without explicit consent gates."
+)
+
+# Per-condition reset step in execute_analyze.md Phase 2. Worktree mode resets
+# tracked files between conditions; live-target mode has no checkout to
+# revert and instead reminds the agent not to mutate the live target.
+_WORKTREE_CONDITION_RESET = "Reset worktree: `git checkout -- .`"
+_LIVE_TARGET_CONDITION_RESET = (
+    "Do not mutate the target system between conditions. Any files you "
+    "wrote to the target directory during the previous condition must be "
+    "removed before the next one runs (this is your responsibility — "
+    "there is no automatic checkout)."
+)
+
+
+def validate_campaign(campaign: dict) -> None:
+    """Validate campaign config. Module-level so it can be called before any
+    dispatcher is constructed (e.g., from `run_iteration` in inline-agent mode,
+    where no LLMDispatcher is built and the staticmethod path is never taken).
+    """
+    ts = campaign.get("target_system")
+    if not isinstance(ts, dict):
+        raise ValueError(
+            "Campaign config missing 'target_system' section. "
+            "See examples/campaign.yaml for the expected format."
+        )
+    required = ["name", "description"]
+    missing = [k for k in required if k not in ts]
+    if missing:
+        raise ValueError(
+            f"Campaign 'target_system' missing required keys: {missing}. "
+            f"See examples/campaign.yaml for the expected format."
+        )
+    for field in ("observable_metrics", "controllable_knobs"):
+        val = ts.get(field)
+        if val is not None:
+            if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
+                raise ValueError(
+                    f"Campaign 'target_system.{field}' must be a list of strings. "
+                    f"Got: {val!r}"
+                )
+    if "live_target" in ts and not isinstance(ts["live_target"], bool):
+        raise ValueError(
+            f"Campaign 'target_system.live_target' must be a bool. "
+            f"Got: {ts['live_target']!r}"
+        )
+
 
 class LLMDispatcher:
     """Dispatch agent roles to an LLM and produce schema-conformant artifacts."""
@@ -50,7 +129,7 @@ def __init__(
         completion_fn: Callable | None = None,
     ) -> None:
         self.work_dir = Path(work_dir)
-        self._validate_campaign(campaign)
+        validate_campaign(campaign)
         self.campaign = campaign
         self.model = model
         self.loader = PromptLoader(
@@ -84,29 +163,7 @@ def __init__(
                 dal,
             )
 
-    @staticmethod
-    def _validate_campaign(campaign: dict) -> None:
-        ts = campaign.get("target_system")
-        if not isinstance(ts, dict):
-            raise ValueError(
-                "Campaign config missing 'target_system' section. "
-                "See examples/campaign.yaml for the expected format."
-            )
-        required = ["name", "description"]
-        missing = [k for k in required if k not in ts]
-        if missing:
-            raise ValueError(
-                f"Campaign 'target_system' missing required keys: {missing}. "
-                f"See examples/campaign.yaml for the expected format."
-            )
-        for field in ("observable_metrics", "controllable_knobs"):
-            val = ts.get(field)
-            if val is not None:
-                if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
-                    raise ValueError(
-                        f"Campaign 'target_system.{field}' must be a list of strings. "
-                        f"Got: {val!r}"
-                    )
+    _validate_campaign = staticmethod(validate_campaign)
 
     # ------------------------------------------------------------------
     # Public interface (satisfies Dispatcher protocol)
@@ -212,13 +269,17 @@ def _build_context(
         perspective: str | None,
     ) -> dict[str, str]:
         ts = self.campaign["target_system"]
+        live_target = bool(ts.get("live_target", False))
         ctx: dict[str, str] = {
             "target_system": ts["name"],
             "system_description": ts["description"],
             "observable_metrics": ", ".join(ts["observable_metrics"]) if ts.get("observable_metrics") else "Not specified — planner should discover from code",
             "controllable_knobs": ", ".join(ts["controllable_knobs"]) if ts.get("controllable_knobs") else "Not specified — planner should discover from code",
             "active_principles": self._format_principles(),
             "iteration": str(iteration),
+            "execution_environment": _LIVE_TARGET_EXECUTION_ENV if live_target else _WORKTREE_EXECUTION_ENV,
+            "worktree_constraint": _LIVE_TARGET_DESIGN_CONSTRAINT if live_target else _WORKTREE_DESIGN_CONSTRAINT,
+            "condition_reset": _LIVE_TARGET_CONDITION_RESET if live_target else _WORKTREE_CONDITION_RESET,
         }
 
         if phase == "design":

diff --git a/orchestrator/schemas/campaign.schema.yaml b/orchestrator/schemas/campaign.schema.yaml
@@ -53,6 +53,9 @@ properties:
         type: ["string", "null"]
         minLength: 1
         description: "Path to target system git repo. Used by CLIDispatcher for code-access agents. If set, experiments run in isolated worktrees."
+      live_target:
+        type: boolean
+        description: "If true, the executor runs directly in repo_path with no per-iteration git worktree. Use for campaigns that probe a running system (cluster, service, dataset) where there is no code to evolve. Bundles must contain no code_changes arms."
 
   models:
     type: object

diff --git a/prompts/methodology/design.md b/prompts/methodology/design.md
@@ -158,7 +158,7 @@ Now design a hypothesis bundle based on what you actually observed and verified:
 - Predictions must be directional, falsifiable, and reference specific observable metrics. Do not invent arbitrary numeric thresholds unless campaign.yaml specifies them.
 - Base all experiment parameters on verified system behavior — if you didn't probe it, don't assume it.
 - **No `sed`/`awk` for code changes.** When describing code modifications in problem framing or bundle arms, describe the *intent* (what to change and why). The executor agent will implement changes properly via file edits, verify they compile, and create reusable `git diff` patches. Never suggest inline shell regex as an implementation strategy.
-- **Worktree isolation assumed.** The executor runs in a clean git worktree. Each condition starts from clean state (`git checkout -- .` runs between conditions). Design your experimental conditions assuming this — don't include manual cleanup steps.
+- {{worktree_constraint}}
 
 ## Output — Write Files Directly
 

diff --git a/prompts/methodology/execute_analyze.md b/prompts/methodology/execute_analyze.md
@@ -1,6 +1,6 @@
 You are a scientific executor for the Nous hypothesis-driven experimentation framework.
 
-You have **shell access**. You are running inside an isolated git worktree of the target system. You own this worktree — reset it yourself with `git checkout -- .` between conditions.
+You have **shell access**. {{execution_environment}}
 
 Your job has FIVE phases — all in one session with full context:
 1. **Prepare** — build, create patches, validate ALL commands
@@ -105,7 +105,7 @@ arms:
 ```
 
 **Important:**
-- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — the experiment runs in a worktree that gets cleaned up.
+- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — only files under `{{iter_dir}}/` are guaranteed to persist past this session.
 - Create per-arm result subdirectories before writing output: `mkdir -p {{iter_dir}}/results/<arm_id>` (the top-level `results/` already exists, but per-arm subdirectories like `results/h-main/` do not).
 - If you create ANY input files for the experiment (config files, workload specs, policy definitions, parameter files), write them to `{{iter_dir}}/inputs/` and list them in the condition's `inputs` array. Do NOT write input files to `/tmp/` or other temporary locations — they will be lost and the experiment will not be reproducible.
 
@@ -114,13 +114,13 @@ arms:
 Run the experiment plan you wrote in Step 4 — execute every command exactly as written. The plan is the source of truth.
 
 For each condition:
-1. Reset worktree: `git checkout -- .`
+1. {{condition_reset}}
 2. Run the `cmd` from the plan
 3. Verify the `output` file was created at the expected path
 
 After each baseline+treatment pair with the same seed, compare key metrics. If they are byte-identical, STOP and investigate — the patch may not be affecting the code path.
 
-**All results must land in `{{iter_dir}}/results/`.** The worktree is temporary — anything written there will be lost.
+**All results must land in `{{iter_dir}}/results/`.** Only files under `{{iter_dir}}/` are guaranteed to persist — anything written elsewhere may be lost.
 
 ## Phase 3: Analyze and Write Findings
-Original file line number
+Diff line change
@@ Expand Up @@
     The planner explores the codebase to discover metrics, knobs, and execution methods. You can optionally provide `observable_metrics` and `controllable_knobs` as hints — see [examples/campaign.yaml](examples/campaign.yaml) for all options.
+    If your target is a *running* system rather than a codebase (a cluster, a deployed service, a scratch directory that isn't a git repo), set `target_system.live_target: true`. The executor then runs directly in `repo_path` with no per-iteration `git worktree`, and the planner is told up front that arms must be probes — see [docs/quickstart.md#live-target-campaigns-live_target-true](docs/quickstart.md#live-target-campaigns-live_target-true) for details.
     ### 5. Run a campaign
     ```bash
@@ Expand Down @@