You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: observational mode for live-system debugging campaigns (#220)
* feat: observational mode for live-system debugging campaigns
Add target_system.observational flag so campaigns whose target is a live
system (cluster, service, dataset) can use repo_path purely to grant the
agent shell access — without per-iteration git worktree isolation.
When observational=true:
- run_iteration skips create_experiment_worktree and runs the executor
directly in repo_path. Prevents the FileNotFoundError "Not a git
repository" failure mode and avoids polluting a non-code target with
per-iteration orphan branches and .nous-experiments/ subdirs.
- The design and execute_analyze prompts swap their worktree paragraphs
for observational equivalents via {{execution_environment}} and
{{worktree_constraint}} placeholders, so the agent is told it is
probing a live target rather than mutating an isolated worktree.
Default behavior is unchanged — the flag is opt-in and the worktree
path remains the default for code-evolution campaigns.
Tested: 10 new tests + 337 existing tests pass.
* fix: allow target_system.observational in campaign schema
The observational flag was wired into validation, prompts, and the
iteration loop but the JSON schema still rejected it as an unknown
property, so campaigns failed at load time.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* review: address PR #220 review feedback
- Fix prompt body / lead-paragraph contradiction in execute_analyze.md.
The lead said "no per-iteration git isolation" in observational mode,
but Phase 2 still hardcoded `git checkout -- .` between conditions
(which would fail with no .git) and framed result-path warnings as
"the worktree is temporary." Replace the reset step with a new
{{condition_reset}} placeholder and rephrase the persistence note
to be accurate in both modes.
- Fix validation bypass: extract _validate_campaign to a module-level
validate_campaign() and call it at the top of run_iteration. The
staticmethod was only invoked from LLMDispatcher.__init__, so inline-
agent mode (which never builds an LLMDispatcher) silently coerced
non-bool observational values via bool() further down.
- Add regression test that create_experiment_worktree IS called when
observational=False (existing tests would all pass if the gate were
inverted).
- Loosen brittle prompt-text assertions: import the fragment constants
and assert constant identity / containment instead of substrings,
so copy-edits to the prompt text don't churn six tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* rename: observational → live_target per reviewer feedback
Reviewer flagged that "observational" collides with the existing
observe-mode in execute_analyze.md, which means "the bundle has no
code_changes arms" — a bundle-level property, not the infra-level
concern of whether to skip worktree creation.
The new flag controls executor environment (live system vs. isolated
worktree), so `live_target` is a more accurate name. Mechanical rename
across iteration.py, llm_dispatch.py, campaign.schema.yaml, and the
test module.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: document live_target campaigns in README and quickstart
Reviewer asked for user-facing docs on when and how to use
live_target: true so the feature is discoverable without reading the
PR description or schema. Adds a quickstart section with an example
campaign and contrasts live_target (campaign-level, no worktree, all
arms must be probes) with observe-mode arms (bundle-level, worktree
still created). README points to the new section.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,6 +123,8 @@ When `repo_path` is set, the campaign directory is created inside the target rep
123
123
124
124
The planner explores the codebase to discover metrics, knobs, and execution methods. You can optionally provide `observable_metrics` and `controllable_knobs` as hints — see [examples/campaign.yaml](examples/campaign.yaml) for all options.
125
125
126
+
If your target is a *running* system rather than a codebase (a cluster, a deployed service, a scratch directory that isn't a git repo), set `target_system.live_target: true`. The executor then runs directly in `repo_path` with no per-iteration `git worktree`, and the planner is told up front that arms must be probes — see [docs/quickstart.md#live-target-campaigns-live_target-true](docs/quickstart.md#live-target-campaigns-live_target-true) for details.
By default Nous treats `repo_path` as a git repo and creates a fresh `git worktree` per iteration so that any source-code patches are isolated. For some campaigns there is no codebase to evolve — the thing you want to study is a *running* system: a Kubernetes cluster, a deployed service, a dataset on disk, a non-git scratch directory. Setting `live_target: true` tells Nous to skip worktree creation and run the executor directly inside `repo_path`.
131
+
132
+
Use it when:
133
+
134
+
- The target is a live system you are probing, not a codebase you are mutating (e.g. a GPU cluster, a production-like service, a workload generator).
135
+
- `repo_path`points at a directory that is not a git repo, or is a git repo whose working tree must not be branched.
136
+
- The bundle should only contain probe-style arms (config tweaks, command-line invocations, observation runs) — never `code_changes`.
137
+
138
+
Example:
139
+
140
+
```yaml
141
+
research_question: >
142
+
Why does p99 latency spike when the cluster autoscaler kicks in?
143
+
144
+
target_system:
145
+
name: "Staging GPU cluster"
146
+
description: >
147
+
Live Kubernetes cluster running our inference workload.
148
+
The agent probes the cluster via kubectl and Prometheus; it does
149
+
not modify source code.
150
+
repo_path: /scratch/cluster-probe # any working directory; need not be a git repo
151
+
live_target: true
152
+
153
+
prompts:
154
+
methodology_layer: "prompts/methodology"
155
+
domain_adapter_layer: null
156
+
```
157
+
158
+
How `live_target` differs from regular observe-mode arms:
159
+
160
+
- **Observe mode** is a *bundle-level* property — an individual arm has no `code_changes`, so the executor skips patching and just runs commands. The campaign can still mix observe arms and evolve arms in the same bundle, and a worktree is still created.
161
+
- **`live_target: true`** is a *campaign-level* property — it controls the *executor environment* (no worktree, run in `repo_path` directly) and tells the planner up front that the target is a shared running system, so every arm must be a probe. Bundles with `code_changes` arms are incoherent in this mode.
162
+
163
+
Pick `live_target: true` when there is nothing meaningful to branch from; pick observe-mode arms when you have a real codebase but a particular iteration only needs to measure, not patch.
Copy file name to clipboardExpand all lines: orchestrator/schemas/campaign.schema.yaml
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -53,6 +53,9 @@ properties:
53
53
type: ["string", "null"]
54
54
minLength: 1
55
55
description: "Path to target system git repo. Used by CLIDispatcher for code-access agents. If set, experiments run in isolated worktrees."
56
+
live_target:
57
+
type: boolean
58
+
description: "If true, the executor runs directly in repo_path with no per-iteration git worktree. Use for campaigns that probe a running system (cluster, service, dataset) where there is no code to evolve. Bundles must contain no code_changes arms."
Copy file name to clipboardExpand all lines: prompts/methodology/design.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ Now design a hypothesis bundle based on what you actually observed and verified:
158
158
- Predictions must be directional, falsifiable, and reference specific observable metrics. Do not invent arbitrary numeric thresholds unless campaign.yaml specifies them.
159
159
- Base all experiment parameters on verified system behavior — if you didn't probe it, don't assume it.
160
160
-**No `sed`/`awk` for code changes.** When describing code modifications in problem framing or bundle arms, describe the *intent* (what to change and why). The executor agent will implement changes properly via file edits, verify they compile, and create reusable `git diff` patches. Never suggest inline shell regex as an implementation strategy.
161
-
-**Worktree isolation assumed.** The executor runs in a clean git worktree. Each condition starts from clean state (`git checkout -- .` runs between conditions). Design your experimental conditions assuming this — don't include manual cleanup steps.
Copy file name to clipboardExpand all lines: prompts/methodology/execute_analyze.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
You are a scientific executor for the Nous hypothesis-driven experimentation framework.
2
2
3
-
You have **shell access**. You are running inside an isolated git worktree of the target system. You own this worktree — reset it yourself with `git checkout -- .` between conditions.
3
+
You have **shell access**. {{execution_environment}}
4
4
5
5
Your job has FIVE phases — all in one session with full context:
6
6
1.**Prepare** — build, create patches, validate ALL commands
@@ -105,7 +105,7 @@ arms:
105
105
```
106
106
107
107
**Important:**
108
-
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — the experiment runs in a worktree that gets cleaned up.
108
+
- All output paths MUST use absolute paths under `{{iter_dir}}/results/`. Do NOT use relative paths — only files under `{{iter_dir}}/` are guaranteed to persist past this session.
109
109
- Create per-arm result subdirectories before writing output: `mkdir -p {{iter_dir}}/results/<arm_id>`(the top-level `results/` already exists, but per-arm subdirectories like `results/h-main/` do not).
110
110
- If you create ANY input files for the experiment (config files, workload specs, policy definitions, parameter files), write them to `{{iter_dir}}/inputs/` and list them in the condition's `inputs` array. Do NOT write input files to `/tmp/` or other temporary locations — they will be lost and the experiment will not be reproducible.
111
111
@@ -114,13 +114,13 @@ arms:
114
114
Run the experiment plan you wrote in Step 4 — execute every command exactly as written. The plan is the source of truth.
115
115
116
116
For each condition:
117
-
1. Reset worktree: `git checkout -- .`
117
+
1. {{condition_reset}}
118
118
2. Run the `cmd` from the plan
119
119
3. Verify the `output` file was created at the expected path
120
120
121
121
After each baseline+treatment pair with the same seed, compare key metrics. If they are byte-identical, STOP and investigate — the patch may not be affecting the code path.
122
122
123
-
**All results must land in `{{iter_dir}}/results/`.** The worktree is temporary — anything written there will be lost.
123
+
**All results must land in `{{iter_dir}}/results/`.** Only files under `{{iter_dir}}/` are guaranteed to persist — anything written elsewhere may be lost.
0 commit comments