Skip to content

Commit 3499dbe

Browse files
namaslclaude
andauthored
feat: observational mode for live-system debugging campaigns (#220)
* feat: observational mode for live-system debugging campaigns Add target_system.observational flag so campaigns whose target is a live system (cluster, service, dataset) can use repo_path purely to grant the agent shell access — without per-iteration git worktree isolation. When observational=true: - run_iteration skips create_experiment_worktree and runs the executor directly in repo_path. Prevents the FileNotFoundError "Not a git repository" failure mode and avoids polluting a non-code target with per-iteration orphan branches and .nous-experiments/ subdirs. - The design and execute_analyze prompts swap their worktree paragraphs for observational equivalents via {{execution_environment}} and {{worktree_constraint}} placeholders, so the agent is told it is probing a live target rather than mutating an isolated worktree. Default behavior is unchanged — the flag is opt-in and the worktree path remains the default for code-evolution campaigns. Tested: 10 new tests + 337 existing tests pass. * fix: allow target_system.observational in campaign schema The observational flag was wired into validation, prompts, and the iteration loop but the JSON schema still rejected it as an unknown property, so campaigns failed at load time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: address PR #220 review feedback - Fix prompt body / lead-paragraph contradiction in execute_analyze.md. The lead said "no per-iteration git isolation" in observational mode, but Phase 2 still hardcoded `git checkout -- .` between conditions (which would fail with no .git) and framed result-path warnings as "the worktree is temporary." Replace the reset step with a new {{condition_reset}} placeholder and rephrase the persistence note to be accurate in both modes. - Fix validation bypass: extract _validate_campaign to a module-level validate_campaign() and call it at the top of run_iteration. The staticmethod was only invoked from LLMDispatcher.__init__, so inline- agent mode (which never builds an LLMDispatcher) silently coerced non-bool observational values via bool() further down. - Add regression test that create_experiment_worktree IS called when observational=False (existing tests would all pass if the gate were inverted). - Loosen brittle prompt-text assertions: import the fragment constants and assert constant identity / containment instead of substrings, so copy-edits to the prompt text don't churn six tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * rename: observational → live_target per reviewer feedback Reviewer flagged that "observational" collides with the existing observe-mode in execute_analyze.md, which means "the bundle has no code_changes arms" — a bundle-level property, not the infra-level concern of whether to skip worktree creation. The new flag controls executor environment (live system vs. isolated worktree), so `live_target` is a more accurate name. Mechanical rename across iteration.py, llm_dispatch.py, campaign.schema.yaml, and the test module. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: document live_target campaigns in README and quickstart Reviewer asked for user-facing docs on when and how to use live_target: true so the feature is discoverable without reading the PR description or schema. Adds a quickstart section with an example campaign and contrasts live_target (campaign-level, no worktree, all arms must be probes) with observe-mode arms (bundle-level, worktree still created). README points to the new section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent dd47128 commit 3499dbe

8 files changed

Lines changed: 458 additions & 30 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ When `repo_path` is set, the campaign directory is created inside the target rep
123123

124124
The planner explores the codebase to discover metrics, knobs, and execution methods. You can optionally provide `observable_metrics` and `controllable_knobs` as hints — see [examples/campaign.yaml](examples/campaign.yaml) for all options.
125125

126+
If your target is a *running* system rather than a codebase (a cluster, a deployed service, a scratch directory that isn't a git repo), set `target_system.live_target: true`. The executor then runs directly in `repo_path` with no per-iteration `git worktree`, and the planner is told up front that arms must be probes — see [docs/quickstart.md#live-target-campaigns-live_target-true](docs/quickstart.md#live-target-campaigns-live_target-true) for details.
127+
126128
### 5. Run a campaign
127129

128130
```bash

docs/quickstart.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,43 @@ After a campaign, your working directory contains:
125125
- **`runs/iter-N/inputs/`** — Agent-created input files (configs, workloads)
126126
- **`runs/iter-N/results/`** — Experiment output files
127127

128+
## Live-target campaigns (`live_target: true`)
129+
130+
By default Nous treats `repo_path` as a git repo and creates a fresh `git worktree` per iteration so that any source-code patches are isolated. For some campaigns there is no codebase to evolve — the thing you want to study is a *running* system: a Kubernetes cluster, a deployed service, a dataset on disk, a non-git scratch directory. Setting `live_target: true` tells Nous to skip worktree creation and run the executor directly inside `repo_path`.
131+
132+
Use it when:
133+
134+
- The target is a live system you are probing, not a codebase you are mutating (e.g. a GPU cluster, a production-like service, a workload generator).
135+
- `repo_path` points at a directory that is not a git repo, or is a git repo whose working tree must not be branched.
136+
- The bundle should only contain probe-style arms (config tweaks, command-line invocations, observation runs) — never `code_changes`.
137+
138+
Example:
139+
140+
```yaml
141+
research_question: >
142+
Why does p99 latency spike when the cluster autoscaler kicks in?
143+
144+
target_system:
145+
name: "Staging GPU cluster"
146+
description: >
147+
Live Kubernetes cluster running our inference workload.
148+
The agent probes the cluster via kubectl and Prometheus; it does
149+
not modify source code.
150+
repo_path: /scratch/cluster-probe # any working directory; need not be a git repo
151+
live_target: true
152+
153+
prompts:
154+
methodology_layer: "prompts/methodology"
155+
domain_adapter_layer: null
156+
```
157+
158+
How `live_target` differs from regular observe-mode arms:
159+
160+
- **Observe mode** is a *bundle-level* property — an individual arm has no `code_changes`, so the executor skips patching and just runs commands. The campaign can still mix observe arms and evolve arms in the same bundle, and a worktree is still created.
161+
- **`live_target: true`** is a *campaign-level* property — it controls the *executor environment* (no worktree, run in `repo_path` directly) and tells the planner up front that the target is a shared running system, so every arm must be a probe. Bundles with `code_changes` arms are incoherent in this mode.
162+
163+
Pick `live_target: true` when there is nothing meaningful to branch from; pick observe-mode arms when you have a real codebase but a particular iteration only needs to measure, not patch.
164+
128165
## Choosing a model
129166

130167
Defaults (from `defaults.yaml`):

orchestrator/iteration.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,13 @@ def run_iteration(
339339
Returns:
340340
An IterationOutcome value: COMPLETED, CONTINUE, ABORTED, or REDESIGN.
341341
"""
342+
# Validate the campaign once, up front. The staticmethod on LLMDispatcher
343+
# is also called from its constructor, but inline-agent mode never builds
344+
# an LLMDispatcher — without this call, a non-bool `live_target` value
345+
# would slip past validation and silently coerce via bool() below.
346+
from orchestrator.llm_dispatch import validate_campaign
347+
validate_campaign(campaign)
348+
342349
engine = Engine(work_dir)
343350
repo_path = campaign.get("target_system", {}).get("repo_path")
344351

@@ -454,7 +461,10 @@ def _max_turns_for(phase_key: str) -> int:
454461
cli_dispatcher.model = _model_for("execute_analyze")
455462
cli_dispatcher.max_turns = _max_turns_for("execute_analyze")
456463
exec_dispatcher = cli_dispatcher or llm_dispatcher
457-
if repo_path:
464+
live_target = bool(
465+
campaign.get("target_system", {}).get("live_target", False)
466+
)
467+
if repo_path and not live_target:
458468
from orchestrator.worktree import (
459469
create_experiment_worktree,
460470
remove_experiment_worktree,
@@ -464,6 +474,12 @@ def _max_turns_for(phase_key: str) -> int:
464474
)
465475
(iter_dir / ".experiment_id").write_text(experiment_id)
466476
print(f" Experiment worktree: {experiment_dir}")
477+
elif repo_path:
478+
# Live-target mode: executor runs directly in repo_path. The
479+
# target system is running (cluster, service, dataset) and there
480+
# is nothing to isolate — bundles must contain no code_changes arms.
481+
experiment_dir = Path(repo_path)
482+
print(f" Live target: executor runs in {experiment_dir}")
467483
if cli_dispatcher:
468484
import contextlib
469485
ctx = cli_dispatcher.override_cwd(experiment_dir) if experiment_dir else contextlib.nullcontext()

orchestrator/llm_dispatch.py

Lines changed: 85 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,85 @@
3535
# Schema cache: schema_name -> parsed schema dict
3636
_schema_cache: dict[str, dict] = {}
3737

38+
# Prompt fragments that swap based on target_system.live_target. Worktree
39+
# mode is the default — code-evolution campaigns get an isolated git worktree
40+
# per iteration. Live-target mode is for running systems (clusters, services,
41+
# datasets) that the executor probes without per-iteration code mutation.
42+
# (The flag is `live_target` rather than `observational` to avoid colliding
43+
# with the existing "observe mode" in execute_analyze.md, which means
44+
# "the bundle has no code_changes arms.")
45+
_WORKTREE_EXECUTION_ENV = (
46+
"You are running inside an isolated git worktree of the target system. "
47+
"You own this worktree — reset it yourself with `git checkout -- .` "
48+
"between conditions."
49+
)
50+
_LIVE_TARGET_EXECUTION_ENV = (
51+
"You are running directly against a live target system, in its working "
52+
"directory. There is no per-iteration git isolation, and your bundle "
53+
"must contain no `code_changes` arms. Do not mutate the target system's "
54+
"persistent state — your job is to probe, measure, and report. Treat "
55+
"any files you create as scratch artifacts that belong under "
56+
"`{{iter_dir}}/inputs/` or `{{iter_dir}}/results/`, not in the target "
57+
"directory."
58+
)
59+
_WORKTREE_DESIGN_CONSTRAINT = (
60+
"**Worktree isolation assumed.** The executor runs in a clean git "
61+
"worktree. Each condition starts from clean state (`git checkout -- .` "
62+
"runs between conditions). Design your experimental conditions assuming "
63+
"this — don't include manual cleanup steps."
64+
)
65+
_LIVE_TARGET_DESIGN_CONSTRAINT = (
66+
"**Live target system.** The executor runs directly against a running "
67+
"system — no git worktree, no code-change arms. All arms must be pure "
68+
"observations of system state (probes, metrics, log scrapes). Do not "
69+
"include `code_changes` in any arm; do not assume mutation is possible "
70+
"without explicit consent gates."
71+
)
72+
73+
# Per-condition reset step in execute_analyze.md Phase 2. Worktree mode resets
74+
# tracked files between conditions; live-target mode has no checkout to
75+
# revert and instead reminds the agent not to mutate the live target.
76+
_WORKTREE_CONDITION_RESET = "Reset worktree: `git checkout -- .`"
77+
_LIVE_TARGET_CONDITION_RESET = (
78+
"Do not mutate the target system between conditions. Any files you "
79+
"wrote to the target directory during the previous condition must be "
80+
"removed before the next one runs (this is your responsibility — "
81+
"there is no automatic checkout)."
82+
)
83+
84+
85+
def validate_campaign(campaign: dict) -> None:
86+
"""Validate campaign config. Module-level so it can be called before any
87+
dispatcher is constructed (e.g., from `run_iteration` in inline-agent mode,
88+
where no LLMDispatcher is built and the staticmethod path is never taken).
89+
"""
90+
ts = campaign.get("target_system")
91+
if not isinstance(ts, dict):
92+
raise ValueError(
93+
"Campaign config missing 'target_system' section. "
94+
"See examples/campaign.yaml for the expected format."
95+
)
96+
required = ["name", "description"]
97+
missing = [k for k in required if k not in ts]
98+
if missing:
99+
raise ValueError(
100+
f"Campaign 'target_system' missing required keys: {missing}. "
101+
f"See examples/campaign.yaml for the expected format."
102+
)
103+
for field in ("observable_metrics", "controllable_knobs"):
104+
val = ts.get(field)
105+
if val is not None:
106+
if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
107+
raise ValueError(
108+
f"Campaign 'target_system.{field}' must be a list of strings. "
109+
f"Got: {val!r}"
110+
)
111+
if "live_target" in ts and not isinstance(ts["live_target"], bool):
112+
raise ValueError(
113+
f"Campaign 'target_system.live_target' must be a bool. "
114+
f"Got: {ts['live_target']!r}"
115+
)
116+
38117

39118
class LLMDispatcher:
40119
"""Dispatch agent roles to an LLM and produce schema-conformant artifacts."""
@@ -50,7 +129,7 @@ def __init__(
50129
completion_fn: Callable | None = None,
51130
) -> None:
52131
self.work_dir = Path(work_dir)
53-
self._validate_campaign(campaign)
132+
validate_campaign(campaign)
54133
self.campaign = campaign
55134
self.model = model
56135
self.loader = PromptLoader(
@@ -84,29 +163,7 @@ def __init__(
84163
dal,
85164
)
86165

87-
@staticmethod
88-
def _validate_campaign(campaign: dict) -> None:
89-
ts = campaign.get("target_system")
90-
if not isinstance(ts, dict):
91-
raise ValueError(
92-
"Campaign config missing 'target_system' section. "
93-
"See examples/campaign.yaml for the expected format."
94-
)
95-
required = ["name", "description"]
96-
missing = [k for k in required if k not in ts]
97-
if missing:
98-
raise ValueError(
99-
f"Campaign 'target_system' missing required keys: {missing}. "
100-
f"See examples/campaign.yaml for the expected format."
101-
)
102-
for field in ("observable_metrics", "controllable_knobs"):
103-
val = ts.get(field)
104-
if val is not None:
105-
if not isinstance(val, list) or not all(isinstance(x, str) for x in val):
106-
raise ValueError(
107-
f"Campaign 'target_system.{field}' must be a list of strings. "
108-
f"Got: {val!r}"
109-
)
166+
_validate_campaign = staticmethod(validate_campaign)
110167

111168
# ------------------------------------------------------------------
112169
# Public interface (satisfies Dispatcher protocol)
@@ -212,13 +269,17 @@ def _build_context(
212269
perspective: str | None,
213270
) -> dict[str, str]:
214271
ts = self.campaign["target_system"]
272+
live_target = bool(ts.get("live_target", False))
215273
ctx: dict[str, str] = {
216274
"target_system": ts["name"],
217275
"system_description": ts["description"],
218276
"observable_metrics": ", ".join(ts["observable_metrics"]) if ts.get("observable_metrics") else "Not specified — planner should discover from code",
219277
"controllable_knobs": ", ".join(ts["controllable_knobs"]) if ts.get("controllable_knobs") else "Not specified — planner should discover from code",
220278
"active_principles": self._format_principles(),
221279
"iteration": str(iteration),
280+
"execution_environment": _LIVE_TARGET_EXECUTION_ENV if live_target else _WORKTREE_EXECUTION_ENV,
281+
"worktree_constraint": _LIVE_TARGET_DESIGN_CONSTRAINT if live_target else _WORKTREE_DESIGN_CONSTRAINT,
282+
"condition_reset": _LIVE_TARGET_CONDITION_RESET if live_target else _WORKTREE_CONDITION_RESET,
222283
}
223284

224285
if phase == "design":

orchestrator/schemas/campaign.schema.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ properties:
5353
type: ["string", "null"]
5454
minLength: 1
5555
description: "Path to target system git repo. Used by CLIDispatcher for code-access agents. If set, experiments run in isolated worktrees."
56+
live_target:
57+
type: boolean
58+
description: "If true, the executor runs directly in repo_path with no per-iteration git worktree. Use for campaigns that probe a running system (cluster, service, dataset) where there is no code to evolve. Bundles must contain no code_changes arms."
5659

5760
metadata:
5861
type: object

prompts/methodology/design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Now design a hypothesis bundle based on what you actually observed and verified:
158158
- Predictions must be directional, falsifiable, and reference specific observable metrics. Do not invent arbitrary numeric thresholds unless campaign.yaml specifies them.
159159
- Base all experiment parameters on verified system behavior — if you didn't probe it, don't assume it.
160160
- **No `sed`/`awk` for code changes.** When describing code modifications in problem framing or bundle arms, describe the *intent* (what to change and why). The executor agent will implement changes properly via file edits, verify they compile, and create reusable `git diff` patches. Never suggest inline shell regex as an implementation strategy.
161-
- **Worktree isolation assumed.** The executor runs in a clean git worktree. Each condition starts from clean state (`git checkout -- .` runs between conditions). Design your experimental conditions assuming this — don't include manual cleanup steps.
161+
- {{worktree_constraint}}
162162

163163
## Output — Write Files Directly
164164

prompts/methodology/execute_analyze.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
You are a scientific executor for the Nous hypothesis-driven experimentation framework.
22

3-
You have **shell access**. You are running inside an isolated git worktree of the target system. You own this worktree — reset it yourself with `git checkout -- .` between conditions.
3+
You have **shell access**. {{execution_environment}}
44

55
Your job has FIVE phases — all in one session with full context:
66
1. **Prepare** — build, create patches, validate ALL commands
@@ -105,7 +105,7 @@ arms:
105105
```
106106
107107
**Important:**
108-
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — the experiment runs in a worktree that gets cleaned up.
108+
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — only files under `{{iter_dir}}/` are guaranteed to persist past this session.
109109
- Create per-arm result subdirectories before writing output: `mkdir -p {{iter_dir}}/results/<arm_id>` (the top-level `results/` already exists, but per-arm subdirectories like `results/h-main/` do not).
110110
- If you create ANY input files for the experiment (config files, workload specs, policy definitions, parameter files), write them to `{{iter_dir}}/inputs/` and list them in the condition's `inputs` array. Do NOT write input files to `/tmp/` or other temporary locations — they will be lost and the experiment will not be reproducible.
111111

@@ -114,13 +114,13 @@ arms:
114114
Run the experiment plan you wrote in Step 4 — execute every command exactly as written. The plan is the source of truth.
115115

116116
For each condition:
117-
1. Reset worktree: `git checkout -- .`
117+
1. {{condition_reset}}
118118
2. Run the `cmd` from the plan
119119
3. Verify the `output` file was created at the expected path
120120

121121
After each baseline+treatment pair with the same seed, compare key metrics. If they are byte-identical, STOP and investigate — the patch may not be affecting the code path.
122122

123-
**All results must land in `{{iter_dir}}/results/`.** The worktree is temporary — anything written there will be lost.
123+
**All results must land in `{{iter_dir}}/results/`.** Only files under `{{iter_dir}}/` are guaranteed to persist — anything written elsewhere may be lost.
124124

125125
## Phase 3: Analyze and Write Findings
126126

0 commit comments

Comments
 (0)